I have a custom, generalized serialization system written in C++ where I've handled intrinsics, std::string and structures containing those. However, for a memory stream class containing a std::vector<byte>, I'd like to make it possible to store and retrieve a std::shared_ptr<T> inside of it (where T is any class that derives from Abstract). Of course, I'd like a solution without using Boost as it would defeat my intent.
As stated on http://en.cppreference.com/w/cpp/memory/shared_ptr :
Constructing a new shared_ptr using the raw underlying pointer owned by another shared_ptr leads to undefined behavior.
The only (hacky) solution I have come up so far is for the binary memory stream class having a small lookup table of std::shared_ptr<Abstract> referenced by the raw pointer itself, making it fairly trivial to read and write them out, and ownership/reference count would be reliable. Then it becomes possible/useful to serialize the raw pointer.
However, ownership/reference count is not of concern as it's guaranteed for the use case. If there is a solution that would only use the std::vector<byte>, I would consider it a more elegant approach as it could provide other use cases.
Since your serialization/deserialization process happens in the same process (i.e. the same memory space) then you can store the raw memory pointers as the binary data in your stream. Consider the idea below, written as a trivial demo.
Unfortunately, std::enable_shared_from_this does not allow to increment/decrement manually the reference counter because it is just storing a weak reference, that is not able to destroy the object on ref == 0 internally. That is why we have to make a manual reference management, specifically for the instances in the byte stream.
class Abstract : public std::enable_shared_from_this<Abstract> {
public:
Abstract() : _count(0) {}
~Abstract() { cout << "I am destoryed" << endl; }
void incrementStreamRef() {
std::lock_guard<std::mutex> lock(_mutex);
if (!_count) {
_guard = this->shared_from_this();
}
++_count;
};
void decrementStreamRef() {
std::lock_guard<std::mutex> lock(_mutex);
if (_count == 0)
return;
if (_count == 1) {
if (_guard.use_count() == 1) {
// After this call `this` will be destroyed
_guard.reset();
return;
}
_guard.reset();
}
--_count;
};
private:
std::mutex _mutex;
std::shared_ptr<Abstract> _guard;
std::size_t _count;
};
void addAbstractToStream(std::vector<uint8_t>& byteStream, Abstract* abstract) {
abstract->incrementStreamRef();
auto offset = byteStream.size();
try {
// 1 byte for type identification
byteStream.resize(offset + sizeof(abstract) + 1);
byteStream[offset]
= 0xEE; // Means the next bytes are the raw pointer to an Abstract instance
++offset;
// Add the raw pointer to the stream
// prealocate memory here
// byteStream.push_back(....;
// ....
} catch (...) {
abstract->decrementStreamRef();
return;
}
std::memcpy(byteStream.data() + static_cast<std::ptrdiff_t>(offset),
(void*)&abstract,
sizeof(abstract));
}
void removeAbstractFromStream(std::vector<uint8_t>& byteStream, std::size_t offset) {
Abstract* abstract;
std::memcpy((void*)&abstract,
byteStream.data() + static_cast<std::ptrdiff_t>(offset),
sizeof(abstract));
abstract->decrementStreamRef();
}
void tryMe(std::vector<uint8_t>& byteStream) {
// Must not be destoryed when we leave the scope
auto abstract = std::make_shared<Abstract>();
addAbstractToStream(byteStream, abstract.get());
cout << "Scope is about to be left" << endl;
}
int main() {
// Always walk over the stream and use `removeAbstractFromStream`
std::vector<uint8_t> byteStream;
// `try` to always clean the byte stream
// Of course RAII is much better
try {
// Do some work with the stream
} catch (...) {
removeAbstractFromStream(byteStream, 1);
throw;
}
tryMe(byteStream);
cout << "Main is about to be left" << endl;
removeAbstractFromStream(byteStream, 1);
cout << "Main is even closer to be left" << endl;
return 0;
}
Of course, more elaborate locking could be fine, or discarded at all if the thread-safety is not a concern. Please, revise the code for corner cases before using in production.
Related
I have been trying to dive deeper into the limitations of pointers to see how they effect the program behind the scenes. One thing my research has led me to is the variables created by pointers must be deleted in a language like C++, otherwise the data will still be on memory.
My question pertains to accessing the data after a functions lifecycle ends. If I create a pointer variable within a function, and then the function comes to a proper close, how would the data be accessed? Would it actually be just garbage taking up space, or is there supposed to be a way to still reference it without having stored the address in another variable?
There's no automatic garbage collection. If you lose the handle (pointer, reference, index, ...) to your resource, your resource will live ad vitam æternam.
If you want your resources to cease to live when their handle goes out of scope, RAII and smart pointers are the tool you need.
If you want your resources to continue to live after their handle goes out of scope, you need to copy the handle and pass it around.
Using standard smart pointers std::unique_ptr and std::shared_ptr memory is freed when pointer goes out of scope. After scope ends object is immediately destroyed+freed and there is no way to access it anymore. Unless you move/copy pointer out of scope to bigger scope, where it will be deleted.
But there is not so difficult to implement lazy garbage collector. Same as before you use smart pointers everywhere, but lazey variant. Now when pointer goes out of scope its object is not immediately destroyed+freed, but instead is delegated to lazy garbage collector, which will destroy+free it later in a separate thread. Exactly this lazy behaviour I implemented in my code below.
I implemented following code from scratch just for fun and as a demo for you, there is no big point why not to use standard greedy freeing techniques of std::unique_ptr and std::shared_ptr. Although there is one very important use case - std::shared_ptr constructs objects at well known points in code, when you call constructor, and you know construction time well, but destroys objects at different undefined points in code and time, because there are shared copies of shared pointer. Thus you may have long destruction delays at unpredicted points in time, which may harm realtime high performance code. Also destruction time might be too big. Lazy deleting moves destruction into separate thread where it can be deleted at its own pace.
Although smart pointer is lazily disposed at scope end, but yet for some nano seconds (or even micro seconds) you may still have access to its undestroyed/unfreed memory, of course time is not guaranteed. This just means that real destruction can happen much later than scope ends, thus is the name of lazy garbage collector. You can even tweak this kind of lazy garbage collector so that it really deletes objects lets say after 1 milli second after theirs smart pointers have been destroyed.
Real garbage collectors are doing similar thing, they free objects much later in time and usually do it automatically by finding bytes in memory that look like real pointers of heap.
There is a Test() function in my code that shows how my lazy variants of standard pointers are used. Also when code is runned you may see in console output that it shows something like:
Construct Obj( 592)
Construct Obj( 1264)
LazyDeleter Dispose( 1264)
LazyDeleter Dispose( 592)
Test finished
Destroy ~Obj( 1264)
Destroy ~Obj( 592)
Here in parenthesis it shows id of object (lower bits of its pointer). You may see that disposal and destruction is done in order exactly opposite to construction order. Disposal to lazy garbage collector happens before test finishes. While real destruction happens later in a separate thread after test finishes.
Try it online!
#include <deque>
#include <atomic>
#include <mutex>
#include <thread>
#include <array>
#include <memory>
#include <iostream>
#include <iomanip>
using DelObj = void (void *);
void Dispose(void * obj, DelObj * del);
template <typename T>
struct LazyDeleter {
void operator ()(T * ptr) const {
struct SDel { static void Del(void * ptr) { delete (T*)ptr; } };
std::cout << "LazyDeleter Dispose(" << std::setw(5) << uintptr_t(ptr) % (1 << 16) << ")" << std::endl;
Dispose(ptr, &SDel::Del);
}
};
template <typename T>
using lazy_unique_ptr = std::unique_ptr<T, LazyDeleter<T>>;
template <typename T>
std::shared_ptr<T> make_lazy_shared(T * ptr) {
return std::shared_ptr<T>(ptr, LazyDeleter<T>{});
}
void Dispose(void * obj, DelObj * del) {
class AtomicMutex {
public:
auto Locker() { return std::lock_guard<AtomicMutex>(*this); }
void lock() { while (f_.test_and_set(std::memory_order_acquire)) {} }
void unlock() { f_.clear(std::memory_order_release); }
auto & Flag() { return f_; }
private:
std::atomic_flag f_ = ATOMIC_FLAG_INIT;
};
class DisposeThread {
struct Entry {
void * obj = nullptr;
DelObj * del = nullptr;
};
public:
DisposeThread() : thr_([&]{
size_t constexpr block = 32;
while (!finish_.load(std::memory_order_relaxed)) {
while (true) {
std::array<Entry, block> cent{};
size_t cent_cnt = 0;
{
auto lock = mux_.Locker();
if (entries_.empty())
break;
cent_cnt = std::min(block, entries_.size());
std::move(entries_.begin(), entries_.begin() + cent_cnt, cent.data());
entries_.erase(entries_.begin(), entries_.begin() + cent_cnt);
}
for (size_t i = 0; i < cent_cnt; ++i) {
auto & entry = cent[i];
try { (*entry.del)(entry.obj); } catch (...) {}
}
}
std::this_thread::yield();
}
}) {}
~DisposeThread() {
while (!entries_.empty())
std::this_thread::yield();
finish_.store(true, std::memory_order_relaxed);
thr_.join();
}
void Add(void * obj, DelObj * del) {
auto lock = mux_.Locker();
entries_.emplace_back(Entry{obj, del});
}
private:
AtomicMutex mux_{};
std::thread thr_{};
std::deque<Entry> entries_;
std::atomic<bool> finish_ = false;
};
static DisposeThread dt{};
dt.Add(obj, del);
}
void Test() {
struct Obj {
Obj() { std::cout << "Construct Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
~Obj() { std::cout << "Destroy ~Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
};
{
lazy_unique_ptr<Obj> uptr(new Obj());
std::shared_ptr<Obj> sptr = make_lazy_shared(new Obj());
auto sptr2 = sptr;
}
std::cout << "Test finished" << std::endl;
}
int main() {
Test();
}
Currently, I am testing some with Dynamic libraries and have some trouble with the Data Management/deletion... how can I 'notify' a Pointer that it got invalid?
What I need: Thread safe Way to delete Data from its library and invalidate DataContainers Ptr or a thread safe workaround.
What I tried:
Using Shared/Weak Pointer = Anyone can delete it (the library gets unloaded, but the pointer still exists in another library and deletes it there but doesn't know how.)
Possible solutions:
- Keep a list of DataContainer and set them manually to nullptr on Library Unload.
- Don't use Ptr, use Index to Vector Location and look up everytime the Data is needed.
Simple Example:
class Data
{
public:
Data(std::string Str) : SomeStr(Str) {}
std::string SomeStr;
};
struct DataContainer
{
Data* m_Data = nullptr;
};
int main()
{
// This vector is static inside a Dynamic Library so we need to use UniquePtr,
// to be sure it gets deleted inside its Library when unloaded
// if we could use SharedPtr/WeakPtr it would be too easy... but it could get deleted by anyone
// Protected by Mutex
std::vector<std::unique_ptr<Data>> DataHolder;
DataHolder.push_back(std::make_unique<Data>("Example Str"));
// this could maybe inside another Dynamic Library
DataContainer Container;
Container.m_Data = (*DataHolder.begin()).get();
// As example instead of using a Dynamic Library that would get unloaded here
DataHolder.clear();
// Cannot use m_Data here, it got deleted by the DataHolder but Container don't know that
std::cout << "Str: " << Container.m_Data->SomeStr << std::endl;
return 0;
}
shared_ptr/weak_ptr is what you need. The module that keeps the ownership has shared_ptrs to objects, but it allows others to get weak_ptrs only. Other modules (who shouldn't have the ownership) have to temporary get shared_ptr out of the weak_ptr each time they need data, and they are obliged to destroy each shared_ptr immediately after they have accessed the data.
If you don't hold this invariant you need some external synchronization between modules like onPointerInvalidated, but this is a much worse design.
As for thread safety, no one can destroy the object if you keep the shared_ptr on it (unless someone does something really malicious like delete shared_ptr_.get()). That implies a contract between the consumer and the owner: consumer locks the shared_ptr for a short period of time (thus delays the destruction if any), while the owner deleting the objects doesn't worry of any dangling pointers.
I agree, I think the shared/weak pointers should do the trick.
This is how I implemented it:
class Data
{
public:
Data(std::string Str) : SomeStr(Str) {}
std::string SomeStr;
};
struct DataHolder {
std::vector<std::shared_ptr<Data>> data;
std::weak_ptr<Data> get_data(size_t idx) {
return std::weak_ptr<Data>(data[idx]);
}
};
struct DataContainer
{
std::weak_ptr<Data> m_Data;
};
int main()
{
// This vector is static inside a Dynamic Library so we need to use UniquePtr,
// to be sure it gets deleted inside its Library when unloaded
// if we could use SharedPtr/WeakPtr it would be too easy... but it could get deleted by anyone
// Protected by Mutex
DataHolder theDataHolder;
theDataHolder.data.push_back(std::make_shared<Data>("Example Str"));
DataContainer Container;
Container.m_Data = theDataHolder.get_data(0);
// container is passed to a method inside the shared lib
auto t = shared_lib_entry(Container);
std::this_thread::sleep_for(2s);
// As example instead of using a Dynamic Library that would get unloaded here
theDataHolder.data.clear();
wait_for_completion(t);
return 0;
}
// INSIDE the SHAERD LIB
std::thread shared_lib_entry(DataContainer &aContainer) {
std::cout << "Entry in the lib: " << private_var << std::endl;
std::thread aThread([&](){
std::cout << "in the thread start" << std::endl;
int count = 5;
while (count-- > 0) {
std::cout << "in the thread ["<< count <<"]" << std::endl;
if (aContainer.m_Data.expired()) {
std::cout << "Someone killed the src " << std::endl;
} else {
auto sp = aContainer.m_Data.lock();
std::cout << "Str: " << sp->SomeStr << std::endl;
}
std::this_thread::sleep_for(1s);
}
});
std::cout << "Thread finished " << private_var << std::endl;
return aThread;
}
I have a Storage class that keeps a list of Things:
#include <iostream>
#include <list>
#include <functional>
class Thing {
private:
int id;
int value = 0;
static int nextId;
public:
Thing() { this->id = Thing::nextId++; };
int getId() const { return this->id; };
int getValue() const { return this->value; };
void add(int n) { this->value += n; };
};
int Thing::nextId = 1;
class Storage {
private:
std::list<std::reference_wrapper<Thing>> list;
public:
void add(Thing& thing) {
this->list.push_back(thing);
}
Thing& findById(int id) const {
for (std::list<std::reference_wrapper<Thing>>::const_iterator it = this->list.begin(); it != this->list.end(); ++it) {
if (it->get().getId() == id) return *it;
}
std::cout << "Not found!!\n";
exit(1);
}
};
I started with a simple std::list<Thing>, but then everything is copied around on insertion and retrieval, and I didn't want this because if I get a copy, altering it does not reflect on the original objects anymore. When looking for a solution to that, I found about std::reference_wrapper on this SO question, but now I have another problem.
Now to the code that uses them:
void temp(Storage& storage) {
storage.findById(2).add(1);
Thing t4; t4.add(50);
storage.add(t4);
std::cout << storage.findById(4).getValue() << "\n";
}
void run() {
Thing t1; t1.add(10);
Thing t2; t2.add(100);
Thing t3; t3.add(1000);
Storage storage;
storage.add(t3);
storage.add(t1);
storage.add(t2);
temp(storage);
t2.add(10000);
std::cout << storage.findById(2).getValue() << "\n";
std::cout << storage.findById(4).getValue() << "\n";
}
My main() simply calls run(). The output I get is:
50
10101
Not found!!
Although I was looking for:
50
10101
50
Question
Looks like the locally declared object t4 ceases to exist when the function returns, which makes sense. I could prevent this by dynamically allocating it, using new, but then I didn't want to manage memory manually...
How can I fix the code without removing the temp() function and without having to manage memory manually?
If I just use a std::list<Thing> as some suggested, surely the problem with t4 and temp will cease to exist, but another problem will arise: the code won't print 10101 anymore, for example. If I keep copying stuff around, I won't be able to alter the state of a stored object.
Who is the owner of the Thing in the Storage?
Your actual problem is ownership. Currently, your Storage does not really contain the Things but instead it is left to the user of the Storage to manage the lifetime of the objects you put inside it. This is very much against the philosophy of std containers. All standard C++ containers own the objects you put in them and the container manages their lifetime (eg you simply call v.resize(v.size()-2) on a vector and the last two elements get destroyed).
Why references?
You already found a way to make the container not own the actual objects (by using a reference_wrapper), but there is no reason to do so. Of a class called Storage I would expect it to hold objects not just references. Moreover, this opens the door to lots of nasty problems, including undefined behaviour. For example here:
void temp(Storage& storage) {
storage.findById(2).add(1);
Thing t4; t4.add(50);
storage.add(t4);
std::cout << storage.findById(4).getValue() << "\n";
}
you store a reference to t4 in the storage. The thing is: t4s lifetime is only till the end of that function and you end up with a dangling reference. You can store such a reference, but it isnt that usefull because you are basically not allowed to do anything with it.
Aren't references a cool thing?
Currently you can push t1, modify it, and then observe that changes on the thingy in Storage, this might be fine if you want to mimic Java, but in c++ we are used to containers making a copy when you push something (there are also methods to create the elements in place, in case you worry about some useless temporaries). And yes, of course, if you really want you can make a standard container also hold references, but lets make a small detour...
Who collects all that garbage?
Maybe it helps to consider that Java is garbage-collected while C++ has destructors. In Java you are used to references floating around till the garbage collector kicks in. In C++ you have to be super aware of the lifetime of your objects. This may sound bad, but acutally it turns out to be extremely usefull to have full control over the lifetime of objects.
Garbage? What garbage?
In modern C++ you shouldnt worry to forget a delete, but rather appreciate the advantages of having RAII. Acquiring resources on initialzation and knowing when a destructor gets called allows to get automatic resource management for basically any kind of resource, something a garbage collector can only dream of (think of files, database connections, etc.).
"How can I fix the code without removing the temp() function and without having to manage memory manually?"
A trick that helped me a lot is this: Whenever I find myself thinking I need to manage a resource manually I stop and ask "Can't someone else do the dirty stuff?". It is really extremely rare that I cannot find a standard container that does exactly what I need out of the box. In your case, just let the std::list do the "dirty" work.
Can't be C++ if there is no template, right?
I would actually suggest you to make Storage a template, along the line of:
template <typename T>
class Storage {
private:
std::list<T> list;
//....
Then
Storage<Thing> thing_storage;
Storage<int> int_storage;
are Storages containing Things and ints, respectively. In that way, if you ever feel like exprimenting with references or pointers you could still instantiate a Storage<reference_wrapper<int>>.
Did I miss something?...maybe references?
I won't be able to alter the state of a stored object
Given that the container owns the object you would rather let the user take a reference to the object in the container. For example with a vector that would be
auto t = std::vector<int>(10,0); // 10 element initialized to 0
auto& first_element = t[0]; // reference to first element
first_element = 5; // first_element is an alias for t[0]
std::cout << t[0]; // i dont want to spoil the fun part
To make this work with your Storage you just have to make findById return a reference. As a demo:
struct foo {
private:
int data;
public:
int& get_ref() { return data;}
const int& get_ref() const { return data;}
};
auto x = foo();
x.get_ref = 12;
TL;DR
How to avoid manual resource managment? Let someone else do it for you and call it automatic resource management :P
t4 is a temporary object that is destroyed at exit from temp() and what you store in storage becomes a dangling reference, causing UB.
It is not quite clear what you're trying to achieve, but if you want to keep the Storage class the same as it is, you should make sure that all the references stored into it are at least as long-lived as the storage itself. This you have discovered is one of the reasons STL containers keep their private copies of elements (others, probably less important, being—elimination of an extra indirection and a much better locality in some cases).
P.S. And please, can you stop writing those this-> and learn about initialization lists in constructors? >_<
In terms of what your code actually appears to be doing, you've definitely overcomplicated your code, by my estimation. Consider this code, which does all the same things your code does, but with far less boilerplate code and in a way that's far more safe for your uses:
#include<map>
#include<iostream>
int main() {
std::map<int, int> things;
int & t1 = things[1];
int & t2 = things[2];
int & t3 = things[3];
t1 = 10;
t2 = 100;
t3 = 1000;
t2++;
things[4] = 50;
std::cout << things.at(4) << std::endl;
t2 += 10000;
std::cout << things.at(2) << std::endl;
std::cout << things.at(4) << std::endl;
things.at(2) -= 75;
std::cout << things.at(2) << std::endl;
std::cout << t2 << std::endl;
}
//Output:
50
10101
50
10026
10026
Note that a few interesting things are happening here:
Because t2 is a reference, and insertion into the map doesn't invalidate references, t2 can be modified, and those modifications will be reflected in the map itself, and vise-versa.
things owns all the values that were inserted into it, and it will be cleaned up due to RAII, and the built-in behavior of std::map, and the broader C++ design principles it is obeying. There's no worry about objects not being cleaned up.
If you need to preserve the behavior where the id incrementing is handled automatically, independently from the end-programmer, we could consider this code instead:
#include<map>
#include<iostream>
int & insert(std::map<int, int> & things, int value) {
static int id = 1;
int & ret = things[id++] = value;
return ret;
}
int main() {
std::map<int, int> things;
int & t1 = insert(things, 10);
int & t2 = insert(things, 100);
int & t3 = insert(things, 1000);
t2++;
insert(things, 50);
std::cout << things.at(4) << std::endl;
t2 += 10000;
std::cout << things.at(2) << std::endl;
std::cout << things.at(4) << std::endl;
things.at(2) -= 75;
std::cout << things.at(2) << std::endl;
std::cout << t2 << std::endl;
}
//Output:
50
10101
50
10026
10026
These code snippets should give you a decent sense of how the language works, and what principles, possibly unfamiliar in the code I've written, that you need to learn about. My general recommendation is to find a good C++ resource for learning the basics of the language, and learn from that. Some good resources can be found here.
One last thing: if the use of Thing is critical to your code, because you need more data saved in the map, consider this instead:
#include<map>
#include<iostream>
#include<string>
//Only difference between struct and class is struct sets everything public by default
struct Thing {
int value;
double rate;
std::string name;
Thing() : Thing(0,0,"") {}
Thing(int value, double rate, std::string name) : value(value), rate(rate), name(std::move(name)) {}
};
int main() {
std::map<int, Thing> things;
Thing & t1 = things[1];
t1.value = 10;
t1.rate = 5.7;
t1.name = "First Object";
Thing & t2 = things[2];
t2.value = 15;
t2.rate = 17.99999;
t2.name = "Second Object";
t2.value++;
std::cout << things.at(2).value << std::endl;
t1.rate *= things.at(2).rate;
std::cout << things.at(1).rate << std::endl;
std::cout << t1.name << "," << things.at(2).name << std::endl;
things.at(1).rate -= 17;
std::cout << t1.rate << std::endl;
}
Based on what François Andrieux and Eljay have said (and what I would have said, had I got there first), here is the way I would do it, if you want to mutate objects you have already added to a list. All that reference_wrapper stuff is just a fancy way of passing pointers around. It will end in tears.
OK. here's the code (now edited as per OP's request):
#include <iostream>
#include <list>
#include <memory>
class Thing {
private:
int id;
int value = 0;
static int nextId;
public:
Thing() { this->id = Thing::nextId++; };
int getId() const { return this->id; };
int getValue() const { return this->value; };
void add(int n) { this->value += n; };
};
int Thing::nextId = 1;
class Storage {
private:
std::list<std::shared_ptr<Thing>> list;
public:
void add(const std::shared_ptr<Thing>& thing) {
this->list.push_back(thing);
}
std::shared_ptr<Thing> findById(int id) const {
for (std::list<std::shared_ptr<Thing>>::const_iterator it = this->list.begin(); it != this->list.end(); ++it) {
if (it->get()->getId() == id) return *it;
}
std::cout << "Not found!!\n";
exit(1);
}
};
void add_another(Storage& storage) {
storage.findById(2)->add(1);
std::shared_ptr<Thing> t4 = std::make_shared<Thing> (); t4->add(50);
storage.add(t4);
std::cout << storage.findById(4)->getValue() << "\n";
}
int main() {
std::shared_ptr<Thing> t1 = std::make_shared<Thing> (); t1->add(10);
std::shared_ptr<Thing> t2 = std::make_shared<Thing> (); t2->add(100);
std::shared_ptr<Thing> t3 = std::make_shared<Thing> (); t3->add(1000);
Storage storage;
storage.add(t3);
storage.add(t1);
storage.add(t2);
add_another(storage);
t2->add(10000);
std::cout << storage.findById(2)->getValue() << "\n";
std::cout << storage.findById(4)->getValue() << "\n";
return 0;
}
Output is now:
50
10101
50
as desired. Run it on Wandbox.
Note that what you are doing here, in effect, is reference counting your Things. The Things themselves are never copied and will go away when the last shared_ptr goes out of scope. Only the shared_ptrs are copied, and they are designed to be copied because that's their job. Doing things this way is almost as efficient as passing references (or wrapped references) around and far safer. When starting out, it's easy to forget that a reference is just a pointer in disguise.
Given that your Storage class does not own the Thing objects, and every Thing object is uniquely counted, why not just store Thing* in the list?
class Storage {
private:
std::list<Thing*> list;
public:
void add(Thing& thing) {
this->list.push_back(&thing);
}
Thing* findById(int id) const {
for (auto thing : this->list) {
if (thing->getId() == id) return thing;
}
std::cout << "Not found!!\n";
return nullptr;
}
};
EDIT: Note that Storage::findById now returns Thing* which allows it to fail gracefully by returning nullptr (rather than exit(1)).
I've been thinking about how to implement the various exception safety guarantee, especially the the strong guarantee, i.e. data is rolled back to it's original state when an exception occurs.
Consider the following, wonderfully contrived examples (C++11 code). Suppose there is a simple data structure storing some value
struct Data
{
int value = 321;
};
and some function modify() operating on that value
void modify(Data& data, int newValue, bool throwExc = false)
{
data.value = newValue;
if(throwExc)
{
// some exception occurs, sentry will roll-back stuff
throw std::exception();
}
}
(one can see how contrived this is). Suppose we wanted to offer the strong exception-safety guarantee for modify(). In case of an exception, the value of Data::value is obviously not rolled back to its original value. One could naively go ahead and try the whole function, setting back stuff manually in appropriate catch block, which is enormously tedious and doesn't scale at all.
Another approach is to use some scoped, RAII helper - sort of like a sentry which knows what to temporarily save and restore in case of an error:
struct FakeSentry
{
FakeSentry(Data& data) : data_(data), value_(data_.value)
{
}
~FakeSentry()
{
if(!accepted_)
{
// roll-back if accept() wasn't called
data_.value = value_;
}
}
void accept()
{
accepted_ = true;
}
Data& data_ ;
int value_;
bool accepted_ = false;
};
The application is simple and require to only call accept() in case of modify() succeeding:
void modify(Data& data, int newValue, bool throwExc = false)
{
FakeSentry sentry(data);
data.value = newValue;
if(throwExc)
{
// some exception occurs, sentry will roll-back stuff
throw std::exception();
}
// prevent rollback
sentry.accept();
}
This gets the job done but doesn't scale well either. There would need to be a sentry for each distinct user-defined type, knowing all the internals of said type.
My question now is: What other patterns, idioms or preferred courses of action come to mind when trying to implement strongly exception safe code?
In general it is called ScopeGuard idiom. It is not always possible to use temporary variable and swap to commit (though it is easy when acceptable) - sometime you need to modify existing structures.
Andrei Alexandrescu and Petru Marginean discuss it in details in following paper: "Generic: Change the Way You Write Exception-Safe Code — Forever".
There is Boost.ScopeExit library which allows to make guard code without coding auxiliary classes. Example from documentation:
void world::add_person(person const& a_person) {
bool commit = false;
persons_.push_back(a_person); // (1) direct action
// Following block is executed when the enclosing scope exits.
BOOST_SCOPE_EXIT(&commit, &persons_) {
if(!commit) persons_.pop_back(); // (2) rollback action
} BOOST_SCOPE_EXIT_END
// ... // (3) other operations
commit = true; // (4) disable rollback actions
}
D programming language has special construct in language for that purpose - scope(failure)
Transaction abc()
{
Foo f;
Bar b;
f = dofoo();
scope(failure) dofoo_undo(f);
b = dobar();
return Transaction(f, b);
}:
Andrei Alexandrescu shows advantages of that language construct in his talk: "Three Unlikely Successful Features of D"
I have made platform dependent implementation of scope(failure) feature which works on MSVC, GCC, Clag and Intel compilers. It is in library: stack_unwinding. In C++11 it allows to achieve syntax which is very close to D language. Here is Online DEMO:
int main()
{
using namespace std;
{
cout << "success case:" << endl;
scope(exit)
{
cout << "exit" << endl;
};
scope(success)
{
cout << "success" << endl;
};
scope(failure)
{
cout << "failure" << endl;
};
}
cout << string(16,'_') << endl;
try
{
cout << "failure case:" << endl;
scope(exit)
{
cout << "exit" << endl;
};
scope(success)
{
cout << "success" << endl;
};
scope(failure)
{
cout << "failure" << endl;
};
throw 1;
}
catch(int){}
}
Output is:
success case:
success
exit
________________
failure case:
failure
exit
The usual approach is not to roll back in case of an exception, but to commit in case of no exception. That means, do the critical stuff first in a way that does not necessarily alter program state, and then commit with a series of non-throwing actions.
Your example would be done like follows then:
void modify(Data& data, int newValue, bool throwExc = false)
{
//first try the critical part
if(throwExc)
{
// some exception occurs, sentry will roll-back stuff
throw std::exception();
}
//then non-throwing commit
data.value = newValue;
}
Of course RAII plays a major role in exception safety, but it's not the only solution.
Another example for "try-and-commit" is the copy-swap-idiom:
X& operator=(X const& other) {
X tmp(other); //copy-construct, might throw
tmp.swap(*this); //swap is a no-throw operation
}
As you can see, this sometimes comes at the cost of additional actions (e.g. if C's copy ctor allocates memory), but that's the price you have to pay some times for exceptionsafety.
I found this question when faced with the case at the end.
If you want to ensure the commit-or-rollback semantics without using copy-and-swap I would recommend providing proxies for all objects and using the proxies consistently.
The idea would be to hide the implementation details and limit the operations on the data to a sub-set that can be rolled back efficiently.
So the code using the data-structure would be something like this:
void modify(Data&data) {
CoRProxy proxy(data);
// Only modify data through proxy - DO NOT USE data
... foo(proxy);
...
proxy.commit(); // If we don't reach this point data will be rolled back
}
struct Data {
int value;
MyBigDataStructure value2; // Expensive to copy
};
struct CoRProxy {
int& value;
const MyBigDataStructure& value2; // Read-only access
void commit() {m_commit=true;}
CoRProxy(data&d):value(d.value),value2(d.value2),
m_commit(false),m_origValue(d.value){;}
~CoRProxy() {if (!m_commit) std::swap(m_origValue,value);}
private:
bool m_commit;
int m_origValue;
};
The main point is that the proxy restricts the interface to data to operations that the proxy can roll back, and (optionally) provides read-only access to the rest of the data. If we really want to ensure that there is no direct access to data we can send the proxy to a new function (or use a lambda).
A similar use-case is using a vector and rolling back push_back in case of failure.
template <class T> struct CoRVectorPushBack {
void push_back(const T&t) {m_value.push_back(t);}
void commit() {m_commit=true;}
CoRVectorPushBack(std::vector<T>&data):
m_value(data),m_origSize(data.size()),m_commit(false){;}
~CoRVectorPushBack() {if (!m_commit) value.resize(m_origSize);}
private:
std::vector<T>&m_value;
size_t m_origSize;
bool m_commit;
};
The downside of this is the need for making a separate class for each operation.
The upside is that the code using the proxies is straightforward and safe (we could even add if (m_commit) throw std::logic_error(); in push_back).
My program will create and delete a lot of objects (from a REST API). These objects will be referenced from multiple places. I'd like to have a "memory cache" and manage objects lifetime with reference counting so they can be released when they aren't used anymore.
All the objects inherit from a base class Ressource.
The Cache is mostly a std::map<_key_, std::shared_ptr<Ressource> >
Then I'm puzzled, how can the Cacheknow when a Ressource ref count is decremented? ie. A call to the std::shared_ptr destructor or operator=.
1/ I don't want to iterate over the std::map and check each ref.count().
2/ Can I reuse std::shared_ptr and implement a custom hook?
class RessourcePtr : public std::shared_ptr<Ressource>
...
3/ Should I implement my own ref count class? ex. https://stackoverflow.com/a/4910158/1058117
Thanks!
make shared_ptr not use delete shows how you can provide a custom delete function for a shared pointer.
You could also use intrusive pointers if you wanted have customer functions for reference add and delete.
You could use a map<Key, weak_ptr<Resource> > for your dictionary.
It would work approximately like this:
map<Key, weak_ptr<Resource> > _cache;
shared_ptr<Resource> Get(const Key& key)
{
auto& wp = _cache[key];
shared_ptr<Resource> sp; // need to be outside of the "if" scope to avoid
// releasing the resource
if (wp.expired()) {
sp = Load(key); // actually creates the resource
wp = sp;
}
return wp.lock();
}
When all shared_ptr returned by Get have been destroyed, the object will be freed. The drawback is that if you use an object and then immediately destroy the shared pointer, then you are not really using a cache, as suggested by #pmr in his comment.
EDIT: this solution is not thread safe as you are probably aware, you'd need to lock accesses to the map object.
The problem is, that in your scenario the pool is going to keep every reference alive. Here is a solution that removes resources from a pool with a reference count of one. The problem is, when to prune the pool. This solution will prune on every call to get. This way scenarios like "release-and-acquire-again" will be fast.
#include <memory>
#include <map>
#include <string>
#include <iostream>
struct resource {
};
class pool {
public:
std::shared_ptr<resource> get(const std::string& x)
{
auto it = cache_.find(x);
std::shared_ptr<resource> ret;
if(it == end(cache_))
ret = cache_[x] = std::make_shared<resource>();
else {
ret = it->second;
}
prune();
return ret;
}
std::size_t prune()
{
std::size_t count = 0;
for(auto it = begin(cache_); it != end(cache_);)
{
if(it->second.use_count() == 1) {
cache_.erase(it++);
++count;
} else {
++it;
}
}
return count;
}
std::size_t size() const { return cache_.size(); }
private:
std::map<std::string, std::shared_ptr<resource>> cache_;
};
int main()
{
pool c;
{
auto fb = c.get("foobar");
auto fb2 = c.get("foobar");
std::cout << fb.use_count() << std::endl;
std::cout << "pool size: " << c.size() << std::endl;
}
auto fb3 = c.get("bar");
std::cout << fb3.use_count() << std::endl;
std::cout << "pool size: " << c.size() << std::endl;
return 0;
}
You do not want a cache you want a pool. Specifically an object pool. Your main problem is not how to implement a ref-count, shared_ptr already does that for you. when a resource is no longer needed you just remove it from the cache. You main problem will be memory fragmentation due to constant allocation/deletion and slowness due to contention in the global memory allocator. Look at a thread specific memory pool implementation for an answer.