C++ weak_ptr creation performance

C++ weak_ptr creation performance - c++

I've read that creating or copying a std::shared_ptr involves some overhead (atomic increment of reference counter etc..).
But what about creating a std::weak_ptr from it instead:
Obj * obj = new Obj();
// fast
Obj * o = obj;
// slow
std::shared_ptr<Obj> a(o);
// slow
std::shared_ptr<Obj> b(a);
// slow ?
std::weak_ptr<Obj> c(b);
I was hoping in some faster performance, but i know that the shared pointer still have to increment the weak references counter..
So is this still as slow as copying a shared_ptr into another?

This is from my days with game engines
The story goes:
We need a fast shared pointer implementation, one that doesn't thrash the cache (caches are smarter now btw)
A normal pointer:
XXXXXXXXXXXX....
^--pointer to data
Our shared pointer:
iiiiXXXXXXXXXXXXXXXXX...
^ ^---pointer stored in shared pointer
|
+---the start of the allocation, the allocation is sizeof(unsigned int)+sizeof(T)
The unsigned int* used for the count is at ((unsigned int*)ptr)-1
that way a "shared pointer" is pointer sized,and the data it contains is the pointer to the actual data. So (because template=>inline and any compiler would inline an operator returning a data member) it was the same "overhead" for access as a normal pointer.
Creation of pointers took like 3 more CPU instructions than normal (access to a location-4 is on operation, the add of 1 and the write to location -4)
Now we'd only use weak-pointers when we were debugging (so we'd compile with DEBUG defined (macro definition)) because then we'd like to see all allocations and whats going on and such. It was useful.
The weak-pointers must know when what they point to is gone, NOT keep the thing they point to alive (in my case, if the weak pointer kept the allocation alive the engine would never get to recycle or free any memory, then it's basically a shared pointer anyway)
So each weak-pointer has a bool, alive or something, and is a friend of shared_pointer
When debugging our allocation looked like this:
vvvvvvvviiiiXXXXXXXXXXXXX.....
^ ^ ^ the pointer we stored (to the data)
| +that pointer -4 bytes = ref counter
+Initial allocation now
sizeof(linked_list<weak_pointer<T>*>)+sizeof(unsigned int)+sizeof(T)
The linked list structure you use depends on what you care about, we wanted to stay as close to sizeof(T) as we could (we managed memory using the buddy algorithm) so we stored a pointer to the weak_pointer and used the xor trick.... good times.
Anyway: the weak pointers to something shared_pointers point to are put in a list, stored somehow in the "v"s above.
When the reference count hits zero, you go through that list (which is a list of pointers to actual weak_pointers, they remove themselves when deleted obviously) and you set alive=false (or something) to each weak_pointer.
The weak_pointers now know what they point to is no longer there (so threw when de-referenced)
In this example
There is no overhead (the alignment was 4 bytes with the system. 64 bit systems tend to like 8 byte alignments.... union the ref-counter with an int[2] in there to pad it out in that case. Remember this involves inplace news (nobody downvote because I mentioned them :P) and such. You need to make sure the struct you impose on the allocation matches what you allocated and made. Compilers can align stuff for themselves (hence int[2] not int,int).
You can de-reference the shared_pointer with no overhead at all.
New shared pointers being made do not thrash the cache at all and require 3 CPU instructions, they are not very... pipe-line-able but the compiler will inline getters and setters always (if not probably always :P) and there'll be something around the call-site that can fill the pipeline.
The destructor of a shared pointer also does very little (decrements, that's it) so is great!
High performance note
If you have a situation like:
f() {
shared_pointer<T> ptr;
g(ptr);
}
There's no guarantee that the optimiser will dare to not do the adds and subtractions from passing shared_pointer "by value" to g.
This is where you'd use a normal reference (which is implemented as a pointer)
so you'd do g(ptr.extract_reference()); instead - again the compiler will inline the simple getter.
now you have a T&, because ptr's scope entirely surrounds g (assuming g has no side-effects and so forth) that reference will be valid for the duration of g.
deleting references is very ugly and you probably couldn't do it by accident (we relied on this fact).
In hindsight
I should have created a type called "extracted_pointer" or something, it'd be really hard to type that by mistake for a class member.
The weak/shared pointers used by stdlib++
http://gcc.gnu.org/onlinedocs/libstdc++/manual/shared_ptr.html
Not as fast...
But don't worry about the odd cache miss unless you're making a game engine that isn't running a decent workload > 120fps easily :P Still miles better than Java.
The stdlib way is nicer. Each object has it's own allocation and job. With our shared_pointer it was a true case of "trust me it works, try not to worry about how" (not that it is hard) because the code looked really messy.
If you undid the ... whatever they've done to the names of variables in their implementation it'd be far easier to read. See Boost's implementation, as it says in that documents.
Other than variable names the GCC stdlib implementation is lovely. You can read it easily, it does it's job properly (following the OO principle) but is a little slower and MAY thrash the cache on crappy chips these days.
UBER high performance note
You may be thinking, why not have XXXX...XXXXiiii (the reference count at the end) then you'll get the alignment that's best fro the allocator!
Answer:
Because having to do pointer+sizeof(T) may not be one CPU instruction! (Subtracting 4 or 8 is something a CPU can do easy simply because it makes sense, it'll be doing this a lot)

In addition to Alec's very interesting description of the shared/weak_ptr system used in his previous projects, I wanted to give a little more detail on what is likely to be happening for a typical std::shared_ptr/weak_ptr implementation:
// slow
std::shared_ptr<Obj> a(o);
The main expense in the above construction is to allocate a block of memory to hold the two reference counts. No atomic operations need be done here (aside from what the implementation may or may not do under operator new).
// slow
std::shared_ptr<Obj> b(a);
The main expense in the copy construction is typically a single atomic increment.
// slow ?
std::weak_ptr<Obj> c(b);
The main expense in the this weak_ptr constructor is typically a single atomic increment. I would expect the performance of this constructor to be nearly identical to that of the shared_ptr copy constructor.
Two other important constructors to be aware of are:
std::shared_ptr<Obj> d(std::move(a)); // shared_ptr(shared_ptr&&);
std::weak_ptr<Obj> e(std::move( c )); // weak_ptr(weak_ptr&&);
(And matching move assignment operators as well)
The move constructors do not require any atomic operations at all. They just copy the reference count from the rhs to the lhs, and make the rhs == nullptr.
The move assignment operators require an atomic decrement only if the lhs != nullptr prior to the assignment. The bulk of the time (e.g. within a vector<shared_ptr<T>>) the lhs == nullptr prior to a move assignment, and so there are no atomic operations at all.
The latter (the weak_ptr move members) are not actually C++11, but are being handled by LWG 2315. However I would expect it to already be implemented by most implementations (I know it is already implemented in libc++).
These move members will be used when scooting smart pointers around in containers, e.g. under vector<shared_ptr<T>>::insert/erase, and can have a measurable positive impact compared to the use of the smart pointer copy members.
I point it out so that you will know that if you have the opportunity to move instead of copy a shared_ptr/weak_ptr, it is worth the trouble to type the few extra characters to do so.

Related

Returning Large Objects by Value (move semantic) or by pointer?

I have read several articles and answers in SO (in particular this), but they do not provide the full answer to my question. They tend to focus on special cases where the move semantic is as fast as copying a pointer, but that is not always the case.
For example consider this class:
struct Big {
map<string, unsigned> m;
vector<unsigned> v;
set<string> s;
};
And this function:
Big foo();
If foo returns by value and the copy cannot be optimized via RVO, the compiler will apply the move semantic, which implies 3 moves, one for each class member. If the class members were more than 3, then I would have even more operations. If foo returned the Big object by pointer (smart pointer maybe) it would always be 1 operation.
To make things even more interesting, Big objects have a non local life span: they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life and the cost of 3 operations (move semantic) vs 1 operation (pointer) keeps burdening the performance long after the objects were returned by foo.
Given that background information, here are my questions:
1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?
2 - Assuming the move semantic is indeed slower, would I accept to return Big objects by pointer or are there better way to achieve both speed and nice API (I consider returning by value a better API)?
[EDIT]
Bottom line: I like to return by value, because if I introduce one single pointer in the API then they spread everywhere. So I would like to avoid them. However I want to be sure about the performance impact. C++ is all about speed and I cannot accept blindly the move semantic without understanding the performance hit.

they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life
I don't agree with this conclusion. Elements of most data structures tend to be quite stable in memory. Exception are unreserved std::vector and std::string, and other structures based on vector such as flat maps.
If foo returns by value and the copy cannot be optimized via RVO
So, implement foo in a way that can be optimised via RVO. Preferably in such way that a non-move is guaranteed in C++17. This is fast, and a convenient API, so is what you should prefer.
1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?
It is true. Moving Big is relatively slower than copying a pointer. They are both rather light operations in absolute terms through (depending on context).
When you think about returning a pointer to a newly created object, you must also think about the lifetime of the object and where it is stored. If you're thinking of allocating it dynamically, and returning a pointer to the dynamic object, then you must consider that the dynamic allocation may be much more expensive than the few moves of the member objects. And furthermore, all of this may be insignificant in relation to all of the allocations that the std::map and other containers will do, so none of this deliberation may end up mattering in the end.
In conclusion: If you want to know what is faster, then measure. If one implementation measures significantly faster, then that implementation is probably the one that is faster (depending on how good you are at measuring).

Move semantic vs returning a shared_ptr?

I understand in C++ programmers are encouraged to use value semantics. But at my work, I noticed a pattern where some programmers use reference semantic, and to be precise, they use shared_ptr where I would use value semantic.
To give this a bit context, for example, I have an API that reads a Database Page and returns its content. I see there are two ways of doing it.
Choice 1 value semantic:
DBPage readDatabasePage(int number) { // number is the for which page to read
DBPage page;
... // reading the database page
return page; // here we have RVO/move semantic to help us so it is not inefficient
}
Choice 2 reference semantic:
std::shared_ptr<DBPage> readDatabasePage(int number) { // ditto
std::shared_ptr<DBPage> page = std::make_shared<DBPage>();
...
return page;
}
The second choice seems okay for me, as I cannot see a disadvantage of doing it this way. So what I want to understand is why we are encouraging people to use value semantic. What is wrong with choice 2 here?

By default value semantic is preferred because it doesn't allocate memory => no memory leaks, no corrupted memory etc.
Each pointer type has own semantic. shared_ptr should be used only if the resource it controls will be shared, so it's important it will live until the last reference (pointer) to it. In your example shared_ptr is inappropriate. If its desirable to use a pointer in your example (e.g. DBPage is too large to store on the stack), it should be unique_ptr.
It's possible your example is not complete and later the result will be indeed "shared". I'd say even in this case unique_ptr should be used and later "converted" to shared_ptr, otherwise the function signature is misleading, and sometimes it's the only thing that is visible to the user. Though in this case it's arguable.
Also, shared_ptr is slower than unique_ptr, and much slower than std::move, because it uses atomic operations for reference counting. They are very expensive compared with simple int.operator++, though it's a problem only in performance critical cases.

It depends how expensive DBPage is to copy. If for example DBPage is a class containing some pointers to the data, it may be cheap to copy, and storing it in a shared_ptr might add unnecessary overhead.
On the other hand, perhaps DBPage is expensive to copy. You mention RVO and move semantics, and those are fine when returning DBPage from a function, but if users actually want to keep two variables which refer to the same data, and they want the lifetime of the data to be the maximum of the two variables' lifetimes, then shared_ptr is a natural fit.
If users ultimately need shared_ptr<DBPage> but you give them DBPage, they may need to copy the data to get what they want.
In short, you need to understand your users and the actual data in play.

You have two different situations in which you want to create an object. Your "value semantics" work well in situations where you want certain memory to be the returned object (stack memory, memory in an array, etc.). Your "reference semantics" are useful when you want your objects lifetime to exceed its scope (and live in the heap).
To specifically address your question, choice 2 won't be readily useable when you want to store the result in a std::vector<DBPage>. Even with move semantics, having the right function for the job is the better choice.

How efficient is accessing variables through a chain->of->pointers?

I had my doubts since I first saw where it leads, but now that I look at some code I have (medium-ish beginner), it strikes me as not only ugly, but potentially slow?
If I have a struct S inside a class A, called with class B (composition), and I need to do something like this:
struct S { int x[3] {1, 2, 3}; };
S *s;
A(): s {new S} {}
B(A *a) { a->s->x[1] = 4; }
How efficient is this chain: a->s->x[1]? Is this ugly and unnecessary? A potential drag? If there are even more levels in the chain, is it that much uglier? Should this be avoided? Or, if by any chance none of the previous, is it a better approach than:
S s;
B(A *a): { a->s.x[1] = 4; }
It seems slower like this, since (if I got it right) I have to make a copy of the struct, rather than working with a pointer to it. I have no idea what to think about this.

is it a better approach
In the case you just showed no, not at all.
First of all, in modern C++ you should avoid raw pointers with ownership which means that you shouldn't use new, never. Use one of the smart pointers that fit your needs:
std::unique_ptr for sole ownership.
std::shared_ptr for multiple objects -> same resource.
I can't exactly tell you about the performance but direct access through the member s won't ever be slower than direct access through the member s that is dereferenced. You should always go for the non-pointer way here.
But take another step back. You don't even need pointers here in the first place. s should just be an object like in your 2nd example and replace the pointer in B's constructor for a reference.
I have to make a copy of the struct, rather than working with a
pointer to it.
No, no copy will be made.

The real cost of using pointers to objects in many iterations, is not necessarily the dereferencing of the pointer itself, but the potential cost of loading another cache frame into the CPU cache. As long as the pointers points to something within the currently loaded cache frame, the cost is minimal.

Always avoid dynamic allocation with new wherever possible, as it is potentially a very expensive operation, and requires an indirection operation to access the thing you allocated. If you do use it, you should also be using smart pointers, but in your case there is absolutely no reason to do so - just have an instance of S (a value, not a pointer) inside your class.

If you consider a->s->x[1] = 4 as ugly, then it is rather because of the chain than because of the arrows, and a->s.x[1] = 4 is ugly to the same extent. In my opinion, the code exposes S more than necessary, though there may sometimes exist good reasons for doing so.
Performance is one thing that matters, others are maintainability and adaptability. A chain of member accesses usually supports the principle of information hiding to a lesser extent than designs where such chains are avoided; Involved objects (and therefore the involved code) is tighter coupled than otherwise, and this usually goes on the cost of maintainability (confer, for example, Law of Demeter as a design principle towards better information hiding:
In particular, an object should avoid invoking methods of a member
object returned by another method. For many modern object oriented
languages that use a dot as field identifier, the law can be stated
simply as "use only one dot". That is, the code a.b.Method() breaks
the law where a.Method() does not. As an analogy, when one wants a dog
to walk, one does not command the dog's legs to walk directly; instead
one commands the dog which then commands its own legs.
Suppose, for example, that you change the size of array x from 3 to 2, then you have to review not only the code of class A, but potentially that of any other class in your program.
However, if we avoid exposing to much of component S, class A could be extended by a member/operator int setSAt(int x, int value), which can then also check, for example, array boundaries; changing S influences only those classes that have S as component:
B(A *a) { a->setSAt(1,4); }

Why have move semantics?

Let me preface by saying that I have read some of the many questions already asked regarding move semantics. This question is not about how to use move semantics, it is asking what the purpose of it is - if I am not mistaken, I do not see why move semantics is needed.
Background
I was implementing a heavy class, which, for the purposes of this question, looked something like this:
class B;
class A
{
private:
std::array<B, 1000> b;
public:
// ...
}
When it came time to make a move assignment operator, I realized that I could significantly optimize the process by changing the b member to std::array<B, 1000> *b; - then movement could just be a deletion and pointer swap.
This lead me to the following thought: now, shouldn't all non-primitive type members be pointers to speed up movement (corrected below [1] [2]) (there is a case to be made for cases where memory should not be dynamically allocated, but in these cases optimizing movement is not an issue since there is no way to do so)?
Here is where I had the following realization - why create a class A which really just houses a pointer b so swapping later is easier when I can simply make a pointer to the entire A class itself. Clearly, if a client expects movement to be significantly faster than copying, the client should be OK with dynamic memory allocation. But in this case, why does the client not just dynamically allocate the whole A class?
The Question
Can't the client already take advantage of pointers to do everything move semantics gives us? If so, then what is the purpose of move semantics?
Move semantics:
std::string f()
{
std::string s("some long string");
return s;
}
int main()
{
// super-fast pointer swap!
std::string a = f();
return 0;
}
Pointers:
std::string *f()
{
std::string *s = new std::string("some long string");
return s;
}
int main()
{
// still super-fast pointer swap!
std::string *a = f();
delete a;
return 0;
}
And here's the strong assignment that everyone says is so great:
template<typename T>
T& strong_assign(T *&t1, T *&t2)
{
delete t1;
// super-fast pointer swap!
t1 = t2;
t2 = nullptr;
return *t1;
}
#define rvalue_strong_assign(a, b) (auto ___##b = b, strong_assign(a, &___##b))
Fine - the latter in both examples may be considered "bad style" - whatever that means - but is it really worth all the trouble with the double ampersands? If an exception might be thrown before delete a is called, that's still not a real problem - just make a guard or use unique_ptr.
Edit [1] I just realized this wouldn't be necessary with classes such as std::vector which use dynamic memory allocation themselves and have efficient move methods. This just invalidates a thought I had - the question below still stands.
Edit [2] As mentioned in the discussion in the comments and answers below this whole point is pretty much moot. One should use value semantics as much as possible to avoid allocation overhead since the client can always move the whole thing to the heap if needed.

I thoroughly enjoyed all the answers and comments! And I agree with all of them. I just wanted to stick in one more motivation that no one has yet mentioned. This comes from N1377:
Move semantics is mostly about performance optimization: the ability
to move an expensive object from one address in memory to another,
while pilfering resources of the source in order to construct the
target with minimum expense.
Move semantics already exists in the current language and library to a
certain extent:
copy constructor elision in some contexts
auto_ptr "copy"
list::splice
swap on containers
All of these operations involve transferring resources from one object
(location) to another (at least conceptually). What is lacking is
uniform syntax and semantics to enable generic code to move arbitrary
objects (just as generic code today can copy arbitrary objects). There
are several places in the standard library that would greatly benefit
from the ability to move objects instead of copy them (to be discussed
in depth below).
I.e. in generic code such as vector::erase, one needs a single unified syntax to move values to plug the hole left by the erased valued. One can't use swap because that would be too expensive when the value_type is int. And one can't use copy assignment as that would be too expensive when value_type is A (the OP's A). Well, one could use copy assignment, after all we did in C++98/03, but it is ridiculously expensive.
shouldn't all non-primitive type members be pointers to speed up movement
This would be horribly expensive when the member type is complex<double>. Might as well color it Java.

Your example gives it away: your code is not exception-safe, and it makes use of the free-store (twice), which can be nontrivial. To use pointers, in many/most situations you have to allocate stuff on the free store, which is much slower than automatic storage, and does not allow for RAII.
They also let you more efficiently represent non-copyable resources, like sockets.
Move semantics aren't strictly necessary, as you can see that C++ has existed for 40 years a while without them. They are simply a better way to represent certain concepts, and an optimization.

Can't the client already take advantage of pointers to do everything move semantics gives us? If so, then what is the purpose of move semantics?
Your second example gives one very good reason why move semantics is a good thing:
std::string *f()
{
std::string *s = new std::string("some long string");
return s;
}
int main()
{
// still super-fast pointer swap!
std::string *a = f();
delete a;
return 0;
}
Here, the client has to examine the implementation to figure out who is responsible for deleting the pointer. With move semantics, this ownership issue won't even come up.
If an exception might be thrown before delete a is called, that's still not a real problem just make a guard or use unique_ptr.
Again, the ugly ownership issue shows up if you don't use move semantics. By the way, how
would you implement unique_ptr without move semantics?
I know about auto_ptr and there are good reasons why it is now deprecated.
is it really worth all the trouble with the double ampersands?
True, it takes some time to get used to it. After you are familiar and comfortable with it, you will be wondering how you could live without move semantics.

Your string example is great. The short string optimization means that short std::strings do not exist in the free store: instead they exist in automatic storage.
The new/delete version means that you force every std::string into the free store. The move version only puts large strings into the free store, and small strings stay (and are possibly copied) in automatic storage.
On top of that your pointer version lacks exception safety, as it has non-RAII resource handles. Even if you do not use exceptions, naked pointer resource owners basically forces single exit point control flow to manage cleanup. On top of that, use of naked pointer ownership leads to resource leaks and dangling pointers.
So the naked pointer version is worse in piles of ways.
move semantics means you can treat complex objects as normal values. You move when you do not want duplicate state, and copy otherwise. Nearly normal types that cannot be copied can expose move only (unique_ptr), others can optimize for it (shared_ptr). Data stored in containers, like std::vector, can now include abnormal types because it is move aware. The std::vector of std::vector goes from ridiculously inefficient and hard to use to easy and fast at the stroke of a standard version.
Pointers place the resource management overhead into the clients, while good C++11 classes handle that problem for you. move semantics makes this both easier to maintain, and far less error prone.

What's the standard way to avoid constant dereferencing after using `new` keyword?

The new keyword hands you back a pointer to the object created, which means you keep having to deference it - I'm just afraid performance may suffer.
E.g. a common situation I'm facing:
class cls {
obj *x; ...
}
// Later, in some member function:
x = new obj(...);
for (i ...) x->bar[i] = x->foo(i + x->baz); // much dereferencing
I'm not overly keen on reference variables either as I have many *x's (e.g. *x, *y, *z, ...) and having to write &x_ref = *x, &y_ref = *y, ... at the start of every function quickly becomes tiresome and verbose.
Indeed, is it better to do:
class cls {
obj x; ... // not pointer
}
x_ptr = new obj(...);
x = *x_ptr; // then work with x, not pointer;
So what's the standard way to work with variables created by new?

There's no other way to work with objects created by new. The location of the unnamed object created by new is always a run-time value. This immediately means that each and every access to such an object will always unconditionally require dereferencing. There's no way around it. That is what "dereferencing" actually is, by definition - accessing through a run-time address.
Your attempts to "replace" pointers with references by doing &x_ref = *x at the beginning of the function are meaningless. They achieve absolutely nothing. References in this context are just syntactic sugar. They might reduce the number of * operators in your source code (and might increase the number of & operators), but they will not affect the number of physical dereferences in the machine code. They will lead to absolutely the same machine code containing absolutely the same amount of physical dereferencing and absolutely the same performance.
Note that in contexts where dereferencing occurs repeatedly many times, a smart compiler might (and will) actually read and store the target address in a CPU register, instead of re-reading it each time from memory. Accessing data through an address stored in a CPU register is always the fastest, i.e. it is even faster than accessing data through compile-time address embedded into the CPU instruction. For this reason, repetitive dereferencing of manageable complexity might not have any negative impact on performance. This, of course, depends significantly on the quality of the compiler.
In situations when you observe significant negative impact on performance from repetitive dereferencing, you might try to cache the target value in a local buffer, use the local buffer for all calculations and then, when the result is ready, store it through the original pointer. For example, if you have a function that repeatedly accesses (reads and/or writes) data through a pointer int *px, you might want to cache the data in an ordinary local variable x
int x = *px;
work with x throughout the entire function and at the end do
*px = x;
Needless to say, this only makes sense when the performance impact from copying the object is low. And of course, you have to be careful with such techniques in aliased situations, since in this case the value of *px is not maintained continuously. (Note again, that in this case we use an ordinary variable x, not a reference. Your attempts to replace single-level pointers with references achieve nothing at all.)
Again, this sort of "data cashing" optimization can also be implicitly performed by the compiler, assuming the compiler has good understanding of the data aliasing relationships present in the code. And this is where C99-style restrict keyword can help it a lot. But that's a different topic.
In any case, there's no "standard" way to do that. The best approach depends critically on your knowledge of data flow relationships that exist in each specific piece of your code.

Instantiate the object without the new keyword, like this:
obj x;
Or if your constructor for obj takes parameters:
obj x(...);
This will give you an object instead of a pointer thereto.

You have to decide whether you want to allocate your things on heap or on stack. Thats completely your decision based on your requirements. and there is no performance degradation with dereferencing. You may allocate your cls in heap that will stay out of scope and keep instances of obj in stack
class cls {
obj x;//default constructor of obj will be called
}
and if obj doesn't have a default constructor you need to call appropiate constructor in cls constructor

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js