About unique_ptr performances - c++

I often read that unique_ptr would be preferred in most situations over shared_ptr because unique_ptr is non-copyable and has move semantics; shared_ptr would add an overhead due to copy and ref-counting;
But when I test unique_ptr in some situations, it appears it's noticably slower (in access) than its counterparts
For example, under gcc 4.5 :
edit : the print method doesn't print anything actually
#include <iostream>
#include <string>
#include <memory>
#include <chrono>
#include <vector>
class Print{
public:
void print(){}
};
void test()
{
typedef vector<shared_ptr<Print>> sh_vec;
typedef vector<unique_ptr<Print>> u_vec;
sh_vec shvec;
u_vec uvec;
//can't use initializer_list with unique_ptr
for (int var = 0; var < 100; ++var) {
shared_ptr<Print> p(new Print());
shvec.push_back(p);
unique_ptr<Print> p1(new Print());
uvec.push_back(move(p1));
}
//-------------test shared_ptr-------------------------
auto time_sh_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var)
{
for(auto it = shvec.begin(), end = shvec.end(); it!= end; ++it)
{
(*it)->print();
}
}
auto time_sh_2 = std::chrono::system_clock::now();
cout <<"test shared_ptr : "<< (time_sh_2 - time_sh_1).count() << " microseconds." << endl;
//-------------test unique_ptr-------------------------
auto time_u_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var)
{
for(auto it = uvec.begin(), end = uvec.end(); it!= end; ++it)
{
(*it)->print();
}
}
auto time_u_2 = std::chrono::system_clock::now();
cout <<"test unique_ptr : "<< (time_u_2 - time_u_1).count() << " microseconds." << endl;
}
On average I get (g++ -O0) :
shared_ptr : 1480 microseconds
unique_ptr : 3350 microseconds
where does the difference come from ? is it explainable ?

UPDATED on Jan 01, 2014
I know this question is pretty old, but the results are still valid on G++ 4.7.0 and libstdc++ 4.7. So, I tried to find out the reason.
What you're benchmarking here is the dereferencing performance using -O0 and, looking at the implementation of unique_ptr and shared_ptr, your results are actually correct.
unique_ptr stores the pointer and the deleter in a ::std::tuple, while shared_ptr stores a naked pointer handle directly. So, when you dereference the pointer (using *, ->, or get) you have an extra call to ::std::get<0>() in unique_ptr. In contrast, shared_ptr directly returns the pointer. On gcc-4.7 even when optimized and inlined, ::std::get<0>() is a bit slower than the direct pointer.. When optimized and inlined, gcc-4.8.1 fully omits the overhead of ::std::get<0>(). On my machine, when compiled with -O3, the compiler generates exactly the same assembly code, which means they are literally the same.
All in all, using the current implementation, shared_ptr is slower on creation, moving, copying and reference counting, but equally as fast *on dereferencing*.
NOTE: print() is empty in the question and the compiler omits the loops when optimized. So, I slightly changed the code to correctly observe the optimization results:
#include <iostream>
#include <string>
#include <memory>
#include <chrono>
#include <vector>
using namespace std;
class Print {
public:
void print() { i++; }
int i{ 0 };
};
void test() {
typedef vector<shared_ptr<Print>> sh_vec;
typedef vector<unique_ptr<Print>> u_vec;
sh_vec shvec;
u_vec uvec;
// can't use initializer_list with unique_ptr
for (int var = 0; var < 100; ++var) {
shvec.push_back(make_shared<Print>());
uvec.emplace_back(new Print());
}
//-------------test shared_ptr-------------------------
auto time_sh_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = shvec.begin(), end = shvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_sh_2 = std::chrono::system_clock::now();
cout << "test shared_ptr : " << (time_sh_2 - time_sh_1).count()
<< " microseconds." << endl;
//-------------test unique_ptr-------------------------
auto time_u_1 = std::chrono::system_clock::now();
for (auto var = 0; var < 1000; ++var) {
for (auto it = uvec.begin(), end = uvec.end(); it != end; ++it) {
(*it)->print();
}
}
auto time_u_2 = std::chrono::system_clock::now();
cout << "test unique_ptr : " << (time_u_2 - time_u_1).count()
<< " microseconds." << endl;
}
int main() { test(); }
NOTE: That is not a fundamental problem and can be easily fixed by discarding the use of ::std::tuple in current libstdc++ implementation.

All you did in the timed blocks is access them. That won't involve any additional overhead at all. The increased time probably comes from the console output scrolling. You can never, ever do I/O in a timed benchmark.
And if you want to test the overhead of ref counting, then actually do some ref counting. How is the increased time for construction, destruction, assignment and other mutating operations of shared_ptr going to factor in to your time at all if you never mutate shared_ptr?
Edit: If there's no I/O then where are the compiler optimizations? They should have nuked the whole thing. Even ideone junked the lot.

You're not testing anything useful here.
What you are talking about: copy
What you are testing: iteration
If you want to test copy, you actually need to perform a copy. Both smart pointers should have similar performance when it comes to reading, because good shared_ptr implementations will keep a local copy of the object pointed to.
EDIT:
Regarding the new elements:
It's not even worth talking about speed when using debug code, in general. If you care about performance, you will use release code (-O2 in general) and thus that's what should be measured, as there can be significant differences between debug and release code. Most notably, inlining of template code can seriously decrease the execution time.
Regarding the benchmark:
I would add another round of measures: naked pointers. Normally, unique_ptr and naked pointers should have the same performance, it would be worth checking it, and it need not necessarily be true in debug mode.
You might want to "interleave" the execution of the two batches or if you cannot, take the average of each among several runs. As it is, if the computer slows down during the end of the benchmark, only the unique_ptr batch will be affected which will perturbate the measure.
You might be interested in learning more from Neil: The Joy of Benchmarks, it's not a definitive guide, but it's quite interesting. Especially the part about forcing side-effects to avoid dead-code removal ;)
Also, be careful about how you measure. The resolution of your clock might be less precise than what it appears to be. If the clock is refreshed only every 15us for example, then any measure around 15us is suspicious. It might be an issue when measuring release code (you might need to add a few turns to the loop).

Related

Accessing data on heap in c++

I have been trying to dive deeper into the limitations of pointers to see how they effect the program behind the scenes. One thing my research has led me to is the variables created by pointers must be deleted in a language like C++, otherwise the data will still be on memory.
My question pertains to accessing the data after a functions lifecycle ends. If I create a pointer variable within a function, and then the function comes to a proper close, how would the data be accessed? Would it actually be just garbage taking up space, or is there supposed to be a way to still reference it without having stored the address in another variable?
There's no automatic garbage collection. If you lose the handle (pointer, reference, index, ...) to your resource, your resource will live ad vitam æternam.
If you want your resources to cease to live when their handle goes out of scope, RAII and smart pointers are the tool you need.
If you want your resources to continue to live after their handle goes out of scope, you need to copy the handle and pass it around.
Using standard smart pointers std::unique_ptr and std::shared_ptr memory is freed when pointer goes out of scope. After scope ends object is immediately destroyed+freed and there is no way to access it anymore. Unless you move/copy pointer out of scope to bigger scope, where it will be deleted.
But there is not so difficult to implement lazy garbage collector. Same as before you use smart pointers everywhere, but lazey variant. Now when pointer goes out of scope its object is not immediately destroyed+freed, but instead is delegated to lazy garbage collector, which will destroy+free it later in a separate thread. Exactly this lazy behaviour I implemented in my code below.
I implemented following code from scratch just for fun and as a demo for you, there is no big point why not to use standard greedy freeing techniques of std::unique_ptr and std::shared_ptr. Although there is one very important use case - std::shared_ptr constructs objects at well known points in code, when you call constructor, and you know construction time well, but destroys objects at different undefined points in code and time, because there are shared copies of shared pointer. Thus you may have long destruction delays at unpredicted points in time, which may harm realtime high performance code. Also destruction time might be too big. Lazy deleting moves destruction into separate thread where it can be deleted at its own pace.
Although smart pointer is lazily disposed at scope end, but yet for some nano seconds (or even micro seconds) you may still have access to its undestroyed/unfreed memory, of course time is not guaranteed. This just means that real destruction can happen much later than scope ends, thus is the name of lazy garbage collector. You can even tweak this kind of lazy garbage collector so that it really deletes objects lets say after 1 milli second after theirs smart pointers have been destroyed.
Real garbage collectors are doing similar thing, they free objects much later in time and usually do it automatically by finding bytes in memory that look like real pointers of heap.
There is a Test() function in my code that shows how my lazy variants of standard pointers are used. Also when code is runned you may see in console output that it shows something like:
Construct Obj( 592)
Construct Obj( 1264)
LazyDeleter Dispose( 1264)
LazyDeleter Dispose( 592)
Test finished
Destroy ~Obj( 1264)
Destroy ~Obj( 592)
Here in parenthesis it shows id of object (lower bits of its pointer). You may see that disposal and destruction is done in order exactly opposite to construction order. Disposal to lazy garbage collector happens before test finishes. While real destruction happens later in a separate thread after test finishes.
Try it online!
#include <deque>
#include <atomic>
#include <mutex>
#include <thread>
#include <array>
#include <memory>
#include <iostream>
#include <iomanip>
using DelObj = void (void *);
void Dispose(void * obj, DelObj * del);
template <typename T>
struct LazyDeleter {
void operator ()(T * ptr) const {
struct SDel { static void Del(void * ptr) { delete (T*)ptr; } };
std::cout << "LazyDeleter Dispose(" << std::setw(5) << uintptr_t(ptr) % (1 << 16) << ")" << std::endl;
Dispose(ptr, &SDel::Del);
}
};
template <typename T>
using lazy_unique_ptr = std::unique_ptr<T, LazyDeleter<T>>;
template <typename T>
std::shared_ptr<T> make_lazy_shared(T * ptr) {
return std::shared_ptr<T>(ptr, LazyDeleter<T>{});
}
void Dispose(void * obj, DelObj * del) {
class AtomicMutex {
public:
auto Locker() { return std::lock_guard<AtomicMutex>(*this); }
void lock() { while (f_.test_and_set(std::memory_order_acquire)) {} }
void unlock() { f_.clear(std::memory_order_release); }
auto & Flag() { return f_; }
private:
std::atomic_flag f_ = ATOMIC_FLAG_INIT;
};
class DisposeThread {
struct Entry {
void * obj = nullptr;
DelObj * del = nullptr;
};
public:
DisposeThread() : thr_([&]{
size_t constexpr block = 32;
while (!finish_.load(std::memory_order_relaxed)) {
while (true) {
std::array<Entry, block> cent{};
size_t cent_cnt = 0;
{
auto lock = mux_.Locker();
if (entries_.empty())
break;
cent_cnt = std::min(block, entries_.size());
std::move(entries_.begin(), entries_.begin() + cent_cnt, cent.data());
entries_.erase(entries_.begin(), entries_.begin() + cent_cnt);
}
for (size_t i = 0; i < cent_cnt; ++i) {
auto & entry = cent[i];
try { (*entry.del)(entry.obj); } catch (...) {}
}
}
std::this_thread::yield();
}
}) {}
~DisposeThread() {
while (!entries_.empty())
std::this_thread::yield();
finish_.store(true, std::memory_order_relaxed);
thr_.join();
}
void Add(void * obj, DelObj * del) {
auto lock = mux_.Locker();
entries_.emplace_back(Entry{obj, del});
}
private:
AtomicMutex mux_{};
std::thread thr_{};
std::deque<Entry> entries_;
std::atomic<bool> finish_ = false;
};
static DisposeThread dt{};
dt.Add(obj, del);
}
void Test() {
struct Obj {
Obj() { std::cout << "Construct Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
~Obj() { std::cout << "Destroy ~Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
};
{
lazy_unique_ptr<Obj> uptr(new Obj());
std::shared_ptr<Obj> sptr = make_lazy_shared(new Obj());
auto sptr2 = sptr;
}
std::cout << "Test finished" << std::endl;
}
int main() {
Test();
}

Why this code does allow to push_back unique_ptr do vector?

so I thought adding unique to vector shouldn't work.
Why does it work for the below code?
Is it cause by not setting copy ctor as "deleted"??
#include <iostream>
#include <vector>
#include <memory>
class Test
{
public:
int i = 5;
};
int main()
{
std::vector<std::unique_ptr<Test>> tests;
tests.push_back(std::make_unique<Test>());
for (auto &test : tests)
{
std::cout << test->i << std::endl;
}
for (auto &test : tests)
{
std::cout << test->i << std::endl;
}
}
There is no copy here, only moves.
In this context, make_unique will produce an instance of unique pointer which is not named, and this push_back sees it as a r-value reference, which it can use as it wants.
It produce pretty much the same result than this code would:
std::vector<std::unique_ptr<Test>> tests;
auto ptr = std::make_unique<Test>();
tests.push_back(std::move(ptr));
This is called move semantics if you want to search more info on the matter. (and this only works from c++11 and beyond)
There are two overloads of std::vector::push_back according to https://en.cppreference.com/w/cpp/container/vector/push_back
In your case you will use the one with rvalue-ref so no copying required.

Is it possible to avoid managing memory manually in this situation in c++?

I have a Storage class that keeps a list of Things:
#include <iostream>
#include <list>
#include <functional>
class Thing {
private:
int id;
int value = 0;
static int nextId;
public:
Thing() { this->id = Thing::nextId++; };
int getId() const { return this->id; };
int getValue() const { return this->value; };
void add(int n) { this->value += n; };
};
int Thing::nextId = 1;
class Storage {
private:
std::list<std::reference_wrapper<Thing>> list;
public:
void add(Thing& thing) {
this->list.push_back(thing);
}
Thing& findById(int id) const {
for (std::list<std::reference_wrapper<Thing>>::const_iterator it = this->list.begin(); it != this->list.end(); ++it) {
if (it->get().getId() == id) return *it;
}
std::cout << "Not found!!\n";
exit(1);
}
};
I started with a simple std::list<Thing>, but then everything is copied around on insertion and retrieval, and I didn't want this because if I get a copy, altering it does not reflect on the original objects anymore. When looking for a solution to that, I found about std::reference_wrapper on this SO question, but now I have another problem.
Now to the code that uses them:
void temp(Storage& storage) {
storage.findById(2).add(1);
Thing t4; t4.add(50);
storage.add(t4);
std::cout << storage.findById(4).getValue() << "\n";
}
void run() {
Thing t1; t1.add(10);
Thing t2; t2.add(100);
Thing t3; t3.add(1000);
Storage storage;
storage.add(t3);
storage.add(t1);
storage.add(t2);
temp(storage);
t2.add(10000);
std::cout << storage.findById(2).getValue() << "\n";
std::cout << storage.findById(4).getValue() << "\n";
}
My main() simply calls run(). The output I get is:
50
10101
Not found!!
Although I was looking for:
50
10101
50
Question
Looks like the locally declared object t4 ceases to exist when the function returns, which makes sense. I could prevent this by dynamically allocating it, using new, but then I didn't want to manage memory manually...
How can I fix the code without removing the temp() function and without having to manage memory manually?
If I just use a std::list<Thing> as some suggested, surely the problem with t4 and temp will cease to exist, but another problem will arise: the code won't print 10101 anymore, for example. If I keep copying stuff around, I won't be able to alter the state of a stored object.
Who is the owner of the Thing in the Storage?
Your actual problem is ownership. Currently, your Storage does not really contain the Things but instead it is left to the user of the Storage to manage the lifetime of the objects you put inside it. This is very much against the philosophy of std containers. All standard C++ containers own the objects you put in them and the container manages their lifetime (eg you simply call v.resize(v.size()-2) on a vector and the last two elements get destroyed).
Why references?
You already found a way to make the container not own the actual objects (by using a reference_wrapper), but there is no reason to do so. Of a class called Storage I would expect it to hold objects not just references. Moreover, this opens the door to lots of nasty problems, including undefined behaviour. For example here:
void temp(Storage& storage) {
storage.findById(2).add(1);
Thing t4; t4.add(50);
storage.add(t4);
std::cout << storage.findById(4).getValue() << "\n";
}
you store a reference to t4 in the storage. The thing is: t4s lifetime is only till the end of that function and you end up with a dangling reference. You can store such a reference, but it isnt that usefull because you are basically not allowed to do anything with it.
Aren't references a cool thing?
Currently you can push t1, modify it, and then observe that changes on the thingy in Storage, this might be fine if you want to mimic Java, but in c++ we are used to containers making a copy when you push something (there are also methods to create the elements in place, in case you worry about some useless temporaries). And yes, of course, if you really want you can make a standard container also hold references, but lets make a small detour...
Who collects all that garbage?
Maybe it helps to consider that Java is garbage-collected while C++ has destructors. In Java you are used to references floating around till the garbage collector kicks in. In C++ you have to be super aware of the lifetime of your objects. This may sound bad, but acutally it turns out to be extremely usefull to have full control over the lifetime of objects.
Garbage? What garbage?
In modern C++ you shouldnt worry to forget a delete, but rather appreciate the advantages of having RAII. Acquiring resources on initialzation and knowing when a destructor gets called allows to get automatic resource management for basically any kind of resource, something a garbage collector can only dream of (think of files, database connections, etc.).
"How can I fix the code without removing the temp() function and without having to manage memory manually?"
A trick that helped me a lot is this: Whenever I find myself thinking I need to manage a resource manually I stop and ask "Can't someone else do the dirty stuff?". It is really extremely rare that I cannot find a standard container that does exactly what I need out of the box. In your case, just let the std::list do the "dirty" work.
Can't be C++ if there is no template, right?
I would actually suggest you to make Storage a template, along the line of:
template <typename T>
class Storage {
private:
std::list<T> list;
//....
Then
Storage<Thing> thing_storage;
Storage<int> int_storage;
are Storages containing Things and ints, respectively. In that way, if you ever feel like exprimenting with references or pointers you could still instantiate a Storage<reference_wrapper<int>>.
Did I miss something?...maybe references?
I won't be able to alter the state of a stored object
Given that the container owns the object you would rather let the user take a reference to the object in the container. For example with a vector that would be
auto t = std::vector<int>(10,0); // 10 element initialized to 0
auto& first_element = t[0]; // reference to first element
first_element = 5; // first_element is an alias for t[0]
std::cout << t[0]; // i dont want to spoil the fun part
To make this work with your Storage you just have to make findById return a reference. As a demo:
struct foo {
private:
int data;
public:
int& get_ref() { return data;}
const int& get_ref() const { return data;}
};
auto x = foo();
x.get_ref = 12;
TL;DR
How to avoid manual resource managment? Let someone else do it for you and call it automatic resource management :P
t4 is a temporary object that is destroyed at exit from temp() and what you store in storage becomes a dangling reference, causing UB.
It is not quite clear what you're trying to achieve, but if you want to keep the Storage class the same as it is, you should make sure that all the references stored into it are at least as long-lived as the storage itself. This you have discovered is one of the reasons STL containers keep their private copies of elements (others, probably less important, being—elimination of an extra indirection and a much better locality in some cases).
P.S. And please, can you stop writing those this-> and learn about initialization lists in constructors? >_<
In terms of what your code actually appears to be doing, you've definitely overcomplicated your code, by my estimation. Consider this code, which does all the same things your code does, but with far less boilerplate code and in a way that's far more safe for your uses:
#include<map>
#include<iostream>
int main() {
std::map<int, int> things;
int & t1 = things[1];
int & t2 = things[2];
int & t3 = things[3];
t1 = 10;
t2 = 100;
t3 = 1000;
t2++;
things[4] = 50;
std::cout << things.at(4) << std::endl;
t2 += 10000;
std::cout << things.at(2) << std::endl;
std::cout << things.at(4) << std::endl;
things.at(2) -= 75;
std::cout << things.at(2) << std::endl;
std::cout << t2 << std::endl;
}
//Output:
50
10101
50
10026
10026
Note that a few interesting things are happening here:
Because t2 is a reference, and insertion into the map doesn't invalidate references, t2 can be modified, and those modifications will be reflected in the map itself, and vise-versa.
things owns all the values that were inserted into it, and it will be cleaned up due to RAII, and the built-in behavior of std::map, and the broader C++ design principles it is obeying. There's no worry about objects not being cleaned up.
If you need to preserve the behavior where the id incrementing is handled automatically, independently from the end-programmer, we could consider this code instead:
#include<map>
#include<iostream>
int & insert(std::map<int, int> & things, int value) {
static int id = 1;
int & ret = things[id++] = value;
return ret;
}
int main() {
std::map<int, int> things;
int & t1 = insert(things, 10);
int & t2 = insert(things, 100);
int & t3 = insert(things, 1000);
t2++;
insert(things, 50);
std::cout << things.at(4) << std::endl;
t2 += 10000;
std::cout << things.at(2) << std::endl;
std::cout << things.at(4) << std::endl;
things.at(2) -= 75;
std::cout << things.at(2) << std::endl;
std::cout << t2 << std::endl;
}
//Output:
50
10101
50
10026
10026
These code snippets should give you a decent sense of how the language works, and what principles, possibly unfamiliar in the code I've written, that you need to learn about. My general recommendation is to find a good C++ resource for learning the basics of the language, and learn from that. Some good resources can be found here.
One last thing: if the use of Thing is critical to your code, because you need more data saved in the map, consider this instead:
#include<map>
#include<iostream>
#include<string>
//Only difference between struct and class is struct sets everything public by default
struct Thing {
int value;
double rate;
std::string name;
Thing() : Thing(0,0,"") {}
Thing(int value, double rate, std::string name) : value(value), rate(rate), name(std::move(name)) {}
};
int main() {
std::map<int, Thing> things;
Thing & t1 = things[1];
t1.value = 10;
t1.rate = 5.7;
t1.name = "First Object";
Thing & t2 = things[2];
t2.value = 15;
t2.rate = 17.99999;
t2.name = "Second Object";
t2.value++;
std::cout << things.at(2).value << std::endl;
t1.rate *= things.at(2).rate;
std::cout << things.at(1).rate << std::endl;
std::cout << t1.name << "," << things.at(2).name << std::endl;
things.at(1).rate -= 17;
std::cout << t1.rate << std::endl;
}
Based on what François Andrieux and Eljay have said (and what I would have said, had I got there first), here is the way I would do it, if you want to mutate objects you have already added to a list. All that reference_wrapper stuff is just a fancy way of passing pointers around. It will end in tears.
OK. here's the code (now edited as per OP's request):
#include <iostream>
#include <list>
#include <memory>
class Thing {
private:
int id;
int value = 0;
static int nextId;
public:
Thing() { this->id = Thing::nextId++; };
int getId() const { return this->id; };
int getValue() const { return this->value; };
void add(int n) { this->value += n; };
};
int Thing::nextId = 1;
class Storage {
private:
std::list<std::shared_ptr<Thing>> list;
public:
void add(const std::shared_ptr<Thing>& thing) {
this->list.push_back(thing);
}
std::shared_ptr<Thing> findById(int id) const {
for (std::list<std::shared_ptr<Thing>>::const_iterator it = this->list.begin(); it != this->list.end(); ++it) {
if (it->get()->getId() == id) return *it;
}
std::cout << "Not found!!\n";
exit(1);
}
};
void add_another(Storage& storage) {
storage.findById(2)->add(1);
std::shared_ptr<Thing> t4 = std::make_shared<Thing> (); t4->add(50);
storage.add(t4);
std::cout << storage.findById(4)->getValue() << "\n";
}
int main() {
std::shared_ptr<Thing> t1 = std::make_shared<Thing> (); t1->add(10);
std::shared_ptr<Thing> t2 = std::make_shared<Thing> (); t2->add(100);
std::shared_ptr<Thing> t3 = std::make_shared<Thing> (); t3->add(1000);
Storage storage;
storage.add(t3);
storage.add(t1);
storage.add(t2);
add_another(storage);
t2->add(10000);
std::cout << storage.findById(2)->getValue() << "\n";
std::cout << storage.findById(4)->getValue() << "\n";
return 0;
}
Output is now:
50
10101
50
as desired. Run it on Wandbox.
Note that what you are doing here, in effect, is reference counting your Things. The Things themselves are never copied and will go away when the last shared_ptr goes out of scope. Only the shared_ptrs are copied, and they are designed to be copied because that's their job. Doing things this way is almost as efficient as passing references (or wrapped references) around and far safer. When starting out, it's easy to forget that a reference is just a pointer in disguise.
Given that your Storage class does not own the Thing objects, and every Thing object is uniquely counted, why not just store Thing* in the list?
class Storage {
private:
std::list<Thing*> list;
public:
void add(Thing& thing) {
this->list.push_back(&thing);
}
Thing* findById(int id) const {
for (auto thing : this->list) {
if (thing->getId() == id) return thing;
}
std::cout << "Not found!!\n";
return nullptr;
}
};
EDIT: Note that Storage::findById now returns Thing* which allows it to fail gracefully by returning nullptr (rather than exit(1)).

Question on function references and threads

I was randomly testing std::thread in my virtual linux machine (GCC 4.4.5-Debian) with this test program:
#include <algorithm>
#include <thread>
#include <iostream>
#include <vector>
#include <functional>
using namespace std;
static int i=0;
void f( vector<int> &test)
{
++i;
cout << "Push back called" << endl;
test.push_back(i);
}
int main()
{
vector<thread> t;
vector<int> test;
for( int i=0; i<1000; ++i )
{
t.push_back( thread( bind(f, test) ) );
}
for( auto it = t.begin(); it != t.end(); ++it )
{
(*it).join();
}
cout << test.size() << endl;
for( auto it = test.begin(); it != test.end(); ++it )
{
cout << *it << endl;
}
return 0;
}
Why does vector test remain empty? Am I doing something stupid with references (probably) or is it something with bind or some threading problem?
Thanks!
UPDATE: with the combined help of Kos and villintehaspan I "fixed" the "problem":
#include <algorithm>
#include <thread>
#include <iostream>
#include <vector>
#include <functional>
using namespace std;
static int i=0;
void f( vector<int> &test)
{
++i;
test.push_back(i);
}
int main()
{
vector<thread> t;
vector<int> test;
for( int i=0; i<1000; ++i )
{
t.push_back( thread(f, std::ref(test)) );
}
for( auto it = t.begin(); it != t.end(); ++it )
{
(*it).join();
}
cout << test.size() << endl;
for( auto it = test.begin(); it != test.end(); ++it )
{
cout << *it << endl;
}
return 0;
}
Which prints all values in order and seems to work OK. Now only one question remains: is this just lucky (aka undefined behavior (TM) ) or is the static variable causing a silent mutex-like step in the code?
PS: I understand the "killing multithreadedness" problem here, and that's not my point. I'm just trying to test the robustness of the basic std::thread functionality...
Looks to me like a threading problem.
While I'm not 100% sure, it should be noted that all 1000 threads:
do ++i on the same int value (it's not an atomic operation- you may encounter problems here, you can use __sync_fetch_and_add(&i,1) instead (note that it's a gcc extension not standard C++);
do push_back simultaneously on a std::vector, which is not a thread-safe container AFAIK... Same for cout I think. I believe you'd need to use a locking mechanism around that (std::mutex perhaps? I've only used pthreads so far but I believe it's what you need).
Note that this kind of kills any benefit of using threads here, but that's a consequence of the fact that you shouldn't use multiple threads at once on a non-thread-safe object.
----EDIT----
I had a google on this threading API (not present on my tdm gcc 4.5 on Windows, unfortunately).
Aparrently instead of:
thread( bind(f, test) )
you can just say
thread( f, test )
and pass an arbitrary number of arguments in this way.
Source: http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=422
This should also solve your problem with making a copy of the vector which I haven't noticed before
(+1 for #villintehaspam here).
Actually, one more thing is needed to make sure the copy isn't created here:
thread( f, std::ref(test) )
will make sure that the vector isn't copied.
Wow, I got confused too. :)
The bind will actually make a copy of the vector, so that each thread push_back's on their own copy (yes, that & won't help here). You need to provide the threads with a pointer or similar so that they use the same vector. You should also make sure to use access protection like suggested by Kos.
Edit: After your fix to use std::ref instead of making a copy of the vector, the multithreaded access problem still remains. My guess is that the only reason you don't get any problems right now is because the example is so trivial (or maybe you've only tried in debug mode) - there is no automatic guarantee that the ++ is atomic just because the int is static.

Efficient push_back of classes and structs

I've seen my colleague do the second snippet quite often. Why is this? I've tried adding print statements to track the ctors and dtors, but both seem identical.
std::vector<ClassTest> vecClass1;
ClassTest ct1;
ct1.blah = blah // set some stuff
...
vecClass1.push_back(ct1);
std::vector<ClassTest> vecClass2;
vecClass2.push_back(ClassTest());
ClassTest& ct2 = vecClass2.back();
ct2.blah = blah // set some stuff
...
PS. I'm sorry if the title is misleading.
Edit:
Firstly, thank you all for your responses.
I've written a small application using std::move. The results are surprising to me perhaps because I've done something wrong ... would someone please explain why the "fast" path is performing significantly better.
#include <vector>
#include <string>
#include <boost/progress.hpp>
#include <iostream>
const std::size_t SIZE = 10*100*100*100;
//const std::size_t SIZE = 1;
const bool log = (SIZE == 1);
struct SomeType {
std::string who;
std::string bio;
SomeType() {
if (log) std::cout << "SomeType()" << std::endl;
}
SomeType(const SomeType& other) {
if (log) std::cout << "SomeType(const SomeType&)" << std::endl;
//this->who.swap(other.who);
//this->bio.swap(other.bio);
this->who = other.who;
this->bio = other.bio;
}
SomeType& operator=(SomeType& other) {
if (log) std::cout << "SomeType::operator=()" << std::endl;
this->who.swap(other.who);
this->bio.swap(other.bio);
return *this;
}
~SomeType() {
if (log) std::cout << "~SomeType()" << std::endl;
}
void swap(SomeType& other) {
if (log) std::cout << "Swapping" << std::endl;
this->who.swap(other.who);
this->bio.swap(other.bio);
}
// move semantics
SomeType(SomeType&& other) :
who(std::move(other.who))
, bio(std::move(other.bio)) {
if (log) std::cout << "SomeType(SomeType&&)" << std::endl;
}
SomeType& operator=(SomeType&& other) {
if (log) std::cout << "SomeType::operator=(SomeType&&)" << std::endl;
this->who = std::move(other.who);
this->bio = std::move(other.bio);
return *this;
}
};
int main(int argc, char** argv) {
{
boost::progress_timer time_taken;
std::vector<SomeType> store;
std::cout << "Timing \"slow\" path" << std::endl;
for (std::size_t i = 0; i < SIZE; ++i) {
SomeType some;
some.who = "bruce banner the hulk";
some.bio = "you do not want to see me angry";
//store.push_back(SomeType());
//store.back().swap(some);
store.push_back(std::move(some));
}
}
{
boost::progress_timer time_taken;
std::vector<SomeType> store;
std::cout << "Timing \"fast\" path" << std::endl;
for (std::size_t i = 0; i < SIZE; ++i) {
store.push_back(SomeType());
SomeType& some = store.back();
some.who = "bruce banner the hulk";
some.bio = "you do not want to see me angry";
}
}
return 0;
}
Output:
dev#ubuntu-10:~/Desktop/perf_test$ g++ -Wall -O3 push_back-test.cpp -std=c++0x
dev#ubuntu-10:~/Desktop/perf_test$ ./a.out
Timing "slow" path
3.36 s
Timing "fast" path
3.08 s
If the object is more expensive to copy after "set some stuff" than before, then the copy that happens when you insert the object into the vector will be less expensive if you insert the object before you "set some stuff" than after.
Really, though, since you should expect objects in a vector to be copied occasionally, this is probably not much of an optimization.
If we accept that your colleague's snippet is wise, because ClassTest is expensive to copy, I would prefer:
using std::swap;
std::vector<ClassTest> vecClass1;
ClassTest ct1;
ct1.blah = blah // set some stuff
...
vecClass1.push_back(ClassTest());
swap(ct1, vecClass1.back());
I think it's clearer, and it may well be more exception-safe. The ... code presumably allocates resources and hence could throw an exception (or else what's making the fully-built ClassTest so expensive to copy?). So unless the vector really is local to the function, I don't think it's a good idea for it to be half-built while running that code.
Of course this is even more expensive if ClassTest only has the default swap implementation, but if ClassTest doesn't have an efficient swap, then it has no business being expensive to copy. So this trick perhaps should only be used with classes known to be friendly, rather than unknown template parameter types.
As Gene says, std::move is better anyway, if you have that C++0x feature.
If we're worried about ClassTest being expensive to copy, though, then relocating the vector is a terrifying prospect. So we should also either:
reserve enough space before adding anything,
use a deque instead of a vector.
The second version benefits from moving the temporary. The first version is copying the temporary vector. So the second one is potentially faster. The second version has also potentially smaller peak memory requirements, the first version creates two objects one temporary and one copy of it and only then deletes the temporary. You can improve the first version by explicitly moving the temporary:
std::vector<ClassTest> vecClass1;
ClassTest ct1;
ct1.blah = blah // set some stuff
...
vecClass1.push_back(std::move(ct1));
You should probably ask your collegue to know exactly why, but we can still take a guess. As James pointed out, it might be a tad more efficient if the object is more expensive to copy once constructed.
I see advantages in both versions.
I like your collegue's snippet because: although there are 2 objects in both cases, they only co-exist for a very short period of time in the second version. There is only one object available for editing: this avoids the potential error of editing ct1 after push_back.
I like your personal snippet because: invoking push_back to add a second object potentially invalidates the reference ct2, inducing a risk of undefined behavior. The first snippet does not present this risk.
They are identical (as far as I can see). Maybe he or she does that as an idiomatic custom.