Consider:
#include <cstdlib>
#include <memory>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
class Gizmo
{
public:
Gizmo() : foo_(shared_ptr<string>(new string("bar"))) {};
Gizmo(Gizmo&& rhs); // Implemented Below
private:
shared_ptr<string> foo_;
};
/*
// doesn't use std::move
Gizmo::Gizmo(Gizmo&& rhs)
: foo_(rhs.foo_)
{
}
*/
// Does use std::move
Gizmo::Gizmo(Gizmo&& rhs)
: foo_(std::move(rhs.foo_))
{
}
int main()
{
typedef vector<Gizmo> Gizmos;
Gizmos gizmos;
generate_n(back_inserter(gizmos), 10000, []() -> Gizmo
{
Gizmo ret;
return ret;
});
random_shuffle(gizmos.begin(), gizmos.end());
}
In the above code, there are two versions of Gizmo::Gizmo(Gizmo&&) -- one uses std::move to actually move the shared_ptr, and the other just copies the shared_ptr.
Both version seem to work on the surface. One difference (the only difference I can see) is in the non-move version the reference count of the shared_ptr is temporarily increased, but only briefly.
I would normally go ahead and move the shared_ptr, but only to be clear and consistent in my code. Am I missing a consideration here? Should I prefer one version over the other for any technical reason?
The main issue here is not the small performance difference due to the extra atomic increment and decrement in shared_ptr but that the semantics of the operation are inconsistent unless you perform a move.
While the assumption is that the reference count of the shared_ptr will only be temporary there is no such guarantee in the language. The source object from which you are moving can be a temporary, but it could also have a much longer lifetime. It could be a named variable that has been casted to an rvalue-reference (say std::move(var)), in which case by not moving from the shared_ptr you are still maintaining shared ownership with the source of the move, and if the destination shared_ptr has a smaller scope then the lifetime of the pointed object will needlessly be extended.
I upvoted James McNellis' answer. I would like to make a comment about his answer but my comment won't fit in the comment format. So I'm putting it here.
A fun way to measure the performance impact of moving a shared_ptr vs copying one is to use something like vector<shared_ptr<T>> to move or copy a whole bunch of them and time it. Most compilers have a way to turn on/off move semantics by specifying the language mode (e.g. -std=c++03 or -std=c++11).
Here is code I just tested at -O3:
#include <chrono>
#include <memory>
#include <vector>
#include <iostream>
int main()
{
std::vector<std::shared_ptr<int> > v(10000, std::shared_ptr<int>(new int(3)));
typedef std::chrono::high_resolution_clock Clock;
typedef Clock::time_point time_point;
typedef std::chrono::duration<double, std::micro> us;
time_point t0 = Clock::now();
v.erase(v.begin());
time_point t1 = Clock::now();
std::cout << us(t1-t0).count() << "\u00B5s\n";
}
Using clang/libc++ and in -std=c++03 this prints out for me:
195.368µs
Switching to -std=c++11 I get:
16.422µs
Your mileage may vary.
The use of move is preferable: it should be more efficient than a copy because it does not require the extra atomic increment and decrement of the reference count.
Related
I have some code like these (from cppcon), when inserting a non-const pair into a unordered_map, the performance is very different to inserting with a const one.
#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <unordered_map>
#include <vector>
using namespace std;
struct StopWatch {
StopWatch() : clk{std::chrono::system_clock::now()} {}
~StopWatch() {
auto now = std::chrono::system_clock::now();
auto diff = now - clk;
cout << chrono::duration_cast<chrono::microseconds>(diff).count() << "ms"
<< endl;
}
decltype(std::chrono::system_clock::now()) clk;
};
void Benchmark_Slow(int iters) {
std::unordered_map<string, int> m;
std::pair<const string, int> p = {};
while (iters--)
m.insert(p);
}
void Benchmark_Fast(int iters) {
std::unordered_map<string, int> m;
const std::pair<const string, int> p = {};
while (iters--)
m.insert(p);
}
int main(void) {
{
StopWatch sw;
Benchmark_Fast(1000000);
}
{
StopWatch sw;
Benchmark_Slow(1000000);
}
return 0;
}
A online demo: Compiler Explorer
128247ms
392454ms
It seems that the const qualifier let the compiler to choose the unordered_map::insert(const value_type&) overload instead of the unordered_map::insert( P&& value ).
cppreference: unordered_map::insert
But I think that a forwarding templated universal reference insert(P&& value) would be the same as an insert with const lvalue reference, an identical copy operation.
But the emplace one(with non-const pair) runs much slower than insert one(with const pair).
Am I missing something here ? Or if this is something has a keyword to be searched on the google, I didn't find something answers that. Thank you in advance.
I think I do found a possible explanation.
from emplace it describe that if insertion fails, the constructed element would be destroyed immediately.
I follow the assembly code compiled with libstd++ of unordered_map::emplace (which accept templated argument and do std::forward) and unordered_map::insert link provided by #Jarod42, I the emplace one always allocate a new hash_node before it check if the key already in the map, because it's templated and it didn't know the argument type (maybe it only know it's is_convertible_to), so it do the construction before examine the key. The one in libc++ seems recognize the type is a const reference thus do the same as the insert one, copy construct occurs only if the key is not exsist.
When I modified the code with different key to be inserted, the difference gone away. quick C++ benchmark
I don't know did I miss something else. I' m sorry for this trivial problem was posted.
If I have the following code that makes use of execution policies, do I need to synchronize all accesses to Foo::value even when I'm just reading the variable?
#include <algorithm>
#include <execution>
#include <vector>
struct Foo { int value; int getValue() const { return value; } };
int main() {
std::vector<Foo> foos;
//fill foos here...
std::sort(std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
return left.getValue() > right.getValue();
});
return 0;
}
My concern is that std::sort() will move (or copy) elements asynchronously which is effectively equivalent to asynchronously writing to Foo::value and, therefore, all read and write operations on that variable need to be synchronized. Is this correct or does the sort function itself take care of this for me?
What if I were to use std::execution::par_unseq?
If you follow the rules, i.e. you don't modify anything or rely on the identity of the objects being sorted inside your callback, then you're safe.
The parallel algorithm is responsible for synchronizing access to the objects it modifies.
See [algorithms.parallel.exec]/2:
If an object is modified by an element access function, the algorithm will perform no other unsynchronized accesses to that object. The modifying element access functions are those which are specified as modifying the object. [ Note: For example, swap(), ++, --, #=, and assignments modify the object. For the assignment and #= operators, only the left argument is modified. — end note ]
In case of std::execution::par_unseq, there's the additional requirement on the user-provided callback that it isn't allowed to call vectorization-unsafe functions, so you can't even lock anything in there.
This is OK. After all, you have told std::sort what you want of it and you would expect it to behave sensibly as a result, given that it is presented with all the relevant information up front. There's not a lot of point to the execution policy parameter at all, otherwise.
Where there might be an issue (although not in your code, as written) is if the comparison function has side effects. Suppose we innocently wrote this:
int numCompares;
std::sort(std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
++numCompares;
return left.getValue() > right.getValue();
});
Now we have introduced a race condition, since two threads of execution might be passing through that code at the same time and access to numCompares is not synchronised (or, as I would put it, serialised).
But, in my slightly contrived example, we don't need to be so naive, because we can simply say:
std::atomic_int numCompares;
and then the problem goes away (and this particular example would also work with what appears to me to be the spectacularly useless std::execution::par_unseq, because std_atomic_int is lockless on any sensible platform, thank you Rusty).
So, in summary, don't be too concerned about what std::sort does (although I would certainly knock up a quick test program and hammer it a bit to see if it does actually work as I am claiming). Instead, be concerned about what you do.
More here.
Edit And while Rusty was digging that up, I did in fact write that quick test program (had to fix your lambda) and, sure enough, it works fine. I can't find an online compiler that supports execution (MSVC seems to think it is experimental) so I can't offer you a live demo, but when run on the latest version of MSVC, this code:
#define _SILENCE_PARALLEL_ALGORITHMS_EXPERIMENTAL_WARNING
#include <algorithm>
#include <execution>
#include <vector>
#include <cstdlib>
#include <iostream>
constexpr int num_foos = 100000;
struct Foo
{
Foo (int value) : value (value) { }
int value;
int getValue() const { return value; }
};
int main()
{
std::vector<Foo> foos;
foos.reserve (num_foos);
// fill foos
for (int i = 0; i < num_foos; ++i)
foos.emplace_back (rand ());
std::sort (std::execution::par, foos.begin(), foos.end(), [](const Foo & left, const Foo & right)
{
return left.getValue() < right.getValue();
});
int last_foo = 0;
for (auto foo : foos)
{
if (foo.getValue () < last_foo)
{
std::cout << "NOT sorted\n";
break;
}
last_foo = foo.getValue ();
}
return 0;
}
Generates the following output every time I run it:
<nothing>
QED.
What is the most correct and efficient way to std::move elements from a vector of a certain type (T1) into a vector of an std::pair of that same type (T1) and another type (T2)?
In other words, how should I write MoveItems()?
#include <iostream> // For std::string
#include <string> // For std::string
#include <vector> // For std::vector
#include <utility> // For std::pair
using std::vector;
using std::string;
using std::pair;
vector<string> DownloadedItems;
vector<pair<string,bool>> ActiveItems;
vector<string> Download()
{
vector<string> Items {"These","Words","Are","Usually","Downloaded"};
return Items;
}
void MoveItems()
{
for ( size_t i = 0; i < DownloadedItems.size(); ++i )
ActiveItems.push_back( std::pair<string,bool>(DownloadedItems.at(i),true) );
}
int main()
{
DownloadedItems = Download();
MoveItems();
return 0;
}
Thank you for your time and help, I truly appreciate it!
void MoveItems()
{
ActiveItems.reserve(DownloadedItems.size());
for (auto& str : DownloadedItems)
ActiveItems.emplace_back(std::move(str), true);
}
N.B.: For strings as small as the ones in your example, moving may have the same cost as copying due to SSO, or perhaps even slightly more expensive if the implementation decides to empty out the source anyway.
Some things you can do:
At the start of MoveItems(), call ActiveItems.reserve(DownloadedItems.size());. This prevents your array from resizing while you push things into it.
Instead of calling push_back call emplace_back. Here is an explanation of the advantages of doing so.
Worth noting, in this example, you can stop the copy into a new data structure by just constructing the std::pair from the start, and not copying data.
I would like to replace the following code with std::lock():
for (mutex* m : mutexes) {
m->lock();
}
Is there anyway I could invoke std::lock () on those mutexes given a std::vector<mutex*>?
Unfortunately the standard library doesn't provide an overload for std::lock that takes a pair of iterators pointing to lockable objects. To use std::lock you must know the number of lockable objects at compile time, and pass them as arguments to the function. However, Boost does provide an overload that takes iterators, and it'll work with std::mutex.
The other piece of scaffolding you'll need is boost::indirect_iterator; this will apply an extra dereference when you dereference the iterator (needed because you have std::vector<std::mutex*> and not std::vector<std::mutex>. The latter would not be very useful anyway since std::mutex cannot be copied or moved.)
#include <boost/thread/locks.hpp>
#include <boost/iterator/indirect_iterator.hpp>
#include <vector>
#include <mutex>
int main()
{
using mutex_list = std::vector<std::mutex*>;
mutex_list mutexes;
boost::indirect_iterator<mutex_list::iterator> first(mutexes.begin()),
last(mutexes.end());
boost::lock(first, last);
}
Live demo
When using emplace_back a constructor must exist for the parameters passed (k,v) thus I need the constructor below. However since I use unique_ptr it complains about not being able to access 'delete' which I believe means I'm doing something that allows me to have more then one pointer.
I can't figure out the syntax. How do I write this constructor the right way?
struct KV{
unique_ptr<string> k, v;
KV(){}
KV (unique_ptr<string> k_,unique_ptr<string> v_):k(move(k_)),v(move(v_)){}
};
Your constructor is OK. A possible problem is that you are not moving the two unique_ptrs when supplying them to your constructor:
#include <memory>
#include <string>
using namespace std;
struct KV{
unique_ptr<string> k, v;
KV(){}
KV (unique_ptr<string> k_,unique_ptr<string> v_):k(move(k_)),v(move(v_)){}
};
int main()
{
unique_ptr<string> p1(new string());
unique_ptr<string> p2(new string());
// KV v(p1, p2); // ERROR!
KV kv(move(p1), move(p2)); // OK
vector<KV> v;
v.emplace_back(move(p1), move(p2)); // OK
}
UPDATE:
When VS2012 was shipped, VC11 did not support variadic templates. The correct implementation of emplace_back() should be variadic, but MS provided a dummy one. When the CTP has been shipped, only the compiler has been updated with support for variadic templates, but the STL hasn't been updated. Therefore, you still get the error.
There is not much to do about this if you can't change your compiler, apart from waiting for the next release of the product to be shipped. In the meanwhile, avoid using emplace_back() and use push_back() instead.
You haven't mentioned what container you're trying to emplace_back into, but assuming it is a vector, if your KV struct is really that simple, there's no need to declare any constructors. Just use aggregate initialization.
#include <memory>
#include <string>
#include <utility>
#include <vector>
using namespace std;
struct KV
{
unique_ptr<string> k, v;
// KV(){}
// KV (unique_ptr<string> k_,unique_ptr<string> v_):k(move(k_)),v(move(v_)){}
};
int main()
{
unique_ptr<string> p1(new string());
unique_ptr<string> p2(new string());
KV v{move(p1), move(p2)}; // initialize an instance
// this step is not necessary, you can skip it
vector<KV> vec;
vec.emplace_back(KV{move(v.k), move(v.v)});
}