C++ design question (need cheap smart pointer) - c++

I have a huge tree where keys insides nodes are indices into a big hash_map v,
where v[key] is a (big) record associated with that key (includes how many nodes in the tree have this key). Right now, key is an integer. So each node
has overhead of storing pointers for children and an integer.
We can remove a key from a node in the tree.
We can't store the actual record in the tree node (because that would be a memory hog).
When a key is removed from a node, we need to look at v, update the count and remove the element
(and compact the vector).
This cries out for a smart pointer implementation: where we have a shared_ptr spread around the tree.
Once the last node that refers to key k is remove, the object is destroyed.
However, I am leery of the size requirements for shared_ptr. I need a cheep reference counted
smart counter. I don't care about concurrent access.

Here's a simple reference-counting smart pointer I picked off the web a few years ago and patched up a little:
/// A simple non-intrusive reference-counted pointer.
/// Behaves like a normal pointer to T, providing
/// operators * and ->.
/// Multiple pointers can point to the same data
/// safely - allocated memory will be deleted when
/// all pointers to the data go out of scope.
/// Suitable for STL containers.
///
template <typename T> class counted_ptr
{
public:
explicit counted_ptr(T* p = 0)
: ref(0)
{
if (p)
ref = new ref_t(p);
}
~counted_ptr()
{
delete_ref();
}
counted_ptr(const counted_ptr& other)
{
copy_ref(other.ref);
}
counted_ptr& operator=(const counted_ptr& other)
{
if (this != &other)
{
delete_ref();
copy_ref(other.ref);
}
return *this;
}
T& operator*() const
{
return *(ref->p);
}
T* operator->() const
{
return ref->p;
}
T* get_ptr() const
{
return ref ? ref->p : 0;
}
template <typename To, typename From>
friend counted_ptr<To> up_cast(counted_ptr<From>& from);
private: // types & members
struct ref_t
{
ref_t(T* p_ = 0, unsigned count_ = 1)
: p(p_), count(count_)
{
}
T* p;
unsigned count;
};
ref_t* ref;
private: // methods
void copy_ref(ref_t* ref_)
{
ref = ref_;
if (ref)
ref->count += 1;
}
void delete_ref()
{
if (ref)
{
ref->count -= 1;
if (ref->count == 0)
{
delete ref->p;
delete ref;
}
ref = 0;
}
}
};
Storage requirements per smart pointer are modest: only the real pointer and the reference count.

Why not just extend your tree implementation to keep track of counts for the keys stored within in? All you need then is another hashmap (or an additional field within each record of your existing hashmap) to keep track of the counts, and some added logic in your tree's add/remove functions to update the counts appropriately.

Related

Why can't `std::priority_queue::top()` return a non-const reference?

I need to maintain a priority queue Q of large objects (of type T). Since these objects are expensive to copy, I would like to be able to retrieve a writable object with auto h = std::move(Q.top()). But I can't do this since std::priority_queue<std::unique_ptr<T>>::top() returns only a const reference. Why? (And is there a simple workaround?)
You can store the large objects as unique_ptr<T> in the priority queue. The thing to note is that queue.top() returns a const unique_ptr<T>&, which means that the T itself isn't const. So you can do this:
T obj(std::move(*queue.top()));
queue.pop();
Edit: Since your T doesn't have a move constructor, I'd just bite the bullet a little bit and use a std::shared_ptr<T>:
std::priority_queue<std::shared_ptr<T>, ...> queue;
// fill queue
// No need for anything special.
std::shared_ptr<T> ptr = queue.top();
queue.pop();
You can wrap your large object to additional structure which will have a field to be used by custom compare function, this member should not be affected by move operations, for example it should by some plain data type like int:
struct BigObject {
std::unique_ptr<int> data;
int forCmp;
};
struct Cmp {
bool operator()(const BigObject& lhs, const BigObject& rhs) {
return lhs.forCmp < rhs.forCmp;
}
};
after move(queue.top()) inner order of queue cannot be broken. Moved instance of BigObject still has a valid value of forCmp used by comparator.
Then inherit from priority_queue, by doing this you will have the access to c underyling container, and add front method:
template<class T, class Cmp>
struct Wrapper : std::priority_queue<T,std::vector<T>,Cmp> {
T& front() {
return this->c.front();
}
};
the use:
Wrapper<BigObject,Cmp> q;
BigObject bo;
bo.forCmp = 12;
q.push(std::move(bo));
BigObject i = std::move(q.front());
Full demo
You can inherit from priority_queue and write T pop() to hide void pop()
template <typename T>
T fixed_priority_queue<T>::pop() {
std::pop_heap(c.begin(), c.end(), comp);
T value = std::move(c.back());
c.pop_back();
return value;
}

Iterator for a list implemented using unique_ptr

I am creating a datastructure that uses unique_ptr. I now want to define different iterators over this datastructure, however the nodes of my data structure are part of the data itself. Because of this I want the iterators to return the actual nodes and not only the values contained within.
Here is what I got so far (much simplified example):
#include <algorithm>
#include <iostream>
#include <memory>
using namespace std;
template <typename T> struct node {
node(T val) : val(val), next(nullptr) {}
node(T val, unique_ptr<node<T>> &n) : val(val), next(move(n)) {}
T val;
unique_ptr<node<T>> next;
template <bool Const = true> struct iter {
using reference =
typename std::conditional<Const, const node<T> *, node<T> *>::type;
iter() : nptr(nullptr) {}
iter(node<T> *n) : nptr(n) {}
reference operator*() { return nptr; }
iter &operator++() {
nptr = nptr->next.get();
return *this;
}
friend bool operator==(const iter &lhs, const iter &rhs) {
return lhs.nptr == rhs.nptr;
}
friend bool operator!=(const iter &lhs, const iter &rhs) {
return lhs.nptr != rhs.nptr;
}
node<T> *nptr;
};
iter<> begin() const { return iter<>(this); }
iter<> end() const { return iter<>(); }
iter<false> begin() { return iter<false>(this); }
iter<false> end() { return iter<false>(); }
};
template <typename T> void pretty_print(const unique_ptr<node<T>> &l) {
auto it = l->begin();
while (it != l->end()) {
auto elem = *it;
cout << elem->val << endl;
++it;
}
}
int main() {
auto a = make_unique<node<int>>(4);
auto b = make_unique<node<int>>(3, a);
auto c = make_unique<node<int>>(2, b);
auto d = make_unique<node<int>>(1, c);
for (auto *elem : *d) {
elem->val = elem->val - 1;
}
pretty_print(d);
return 0;
}
Is it considered bad practice to expose the raw pointers to the elements of the datastructure in this way? Will this work in a more complex example, especially in regard to const-correctness?
This is largely opinion, but I'd say it's a bad idea. unique_ptrs should be unique; the rare exceptions should be if you need to pass a raw pointer to some other function that isn't properly templated (and you know for a fact it doesn't hold on to the pointer).
Otherwise, you're in a situation where reasonable uses, e.g. initializing to a std::vector<node<int>*> using your iterator, violate the assumptions baked into unique_ptr. In general, you want your APIs to behave predictably with limited developer headaches, and your proposal adds the headache of "You can iterate it as long as you don't store anything to anything with a lifetime beyond my custom structure's lifetime".
Better options are:
Returning references to your actual unique_ptrs (so people are able to work with them without violating the uniqueness contract; make them const if they shouldn't be mutated); they'd have to take ownership or explicitly use "borrowed" unmanaged pointers at their own risk to store the results, but iteration would work fine and they can't accidentally violate uniqueness guarantees
Store shared_ptrs internally, and hand out new shared_ptrs during iteration, so ownership is automatically shared and lifetime extends as long as a single shared pointer remains
Importantly, neither of these two options allows someone to accidentally "do the wrong thing". The caller can do bad things, but they have to personally, explicitly bypass the smart pointer protection mechanisms to do so, and that's on their head.
Of course, the third option is:
Return a reference to the value pointed to, not the pointer; if the value is stored it should be copy-constructed
It's possible for users to get this last one wrong (by storing the reference long term, taking the address of it to get a new pointer violating uniqueness guarantees, etc.), but it follows existing C++ conventions (as Ryan points out in the comments) for containers like std::vector, so it's not a new concern; C++ developers generally know not to do terrible things with references acquired from iteration. It's arguably the best option, since it maps well to the standard patterns, making it easier for developers by fitting into existing mental models.
I would use reference instead of pointer in your iterator
using reference =
typename std::conditional<Const, const node<T>&, node<T>&>::type;
So your iterator has only to be dereferencing once.
(I would keep pointer for the member though).
Pointer in (public) interface introduce a doubt about ownership.
The usage syntax would be:
for (auto& elem : *d) {
elem.val = elem.val - 1;
}
or
while (it != l->end()) {
const auto& elem = *it;
std::cout << elem.val << std::endl;
++it;
}
Your iterator indeed is invalidated when its corresponding node is released which is the common case.

Underlying design of boost shared_ptr

I am trying to understand the underlying design of boost shared_ptr class. I want to "port" it to fortran (don't ask). One thing I understand is that the reference count is held by a shared_count class. This prompts me a question. I haven't used C++ since a long time, and never used boost.
Suppose I allocate a single instance of a class X, then pass it to two different shared_ptr instances. From what I understand, each shared_ptr instance does not know anything about the other, hence both shared_ptr instances are referring to the same X instance, while keeping a refcount of 1. if one shared_ptr goes out of scope while the other doesn't, the X object will be deleted (as the refcount drops to zero) and the remaining shared_ptr will have a dangling pointer. In order to keep the shared_ptr refcount, you have to create a shared_ptr from another shared_ptr.
Am I right ? If not, how can boost keep track of which shared_ptrs are referencing a class that knows nothing about the fact that is being referenced through shared_ptrs ?
Basically you're right. Your example will result in dangling pointer (note that there are some exceptions if you use boost::enable_shared_from_this as base class).
Explanation
Problem
boost:shared_ptr and std::shared_ptr share the same idea: create a smart pointer with a reference count from a raw pointer. However, they also share the same problem that all smart pointer have: if you use the raw pointer in another smart pointer which isn't associated to your other smart pointer, you'll end with dangling pointers and multiple calls of delete:
int * ptr = new int;
{
std::shared_ptr<int> shared1(ptr); // initialise a new ref_count = 1
{
std::shared_ptr<int> shared2(ptr); // initialise a new ref_count = 1
} // first call of delete, since shared2.use_count() == 0
} // second call of delete, since shared1.use_count() == 0. ooops
"Solution"
After you created your first smart pointer S from a raw pointer p to an object O you should only use copy constructors with S, not with p, as long as O isn't a derivate from std::enable_shared_from_this. boost has a somewhat equivalent of this, but mixing raw pointer and smart pointer is still a bad idea. Even better - don't use raw pointer if you work with smart pointer:
std::shared_ptr<int> ptr(new int);
{
std::shared_ptr<int> shared1(ptr); // ptr.use_count() == 2
{
std::shared_ptr<int> shared2(ptr); // ptr.use_count() = 3
} // ptr.use_count() = 2
} // ptr.use_count() = 1
Even better, don't allocate the memory yourself but use std::make_shared or boost:make_shared:
std::shared_ptr<int> ptr = std::make_shared<int>();
{
std::shared_ptr<int> shared1(ptr); // ptr.use_count() == 2
{
std::shared_ptr<int> shared2(ptr); // ptr.use_count() == 3
} // ptr.use_count() == 2
} // ptr.use_count() == 1
Possible implementation
The following implementation is very crude compared to the std::shared_ptr, as it doesn't support std::weak_ptr and std::enable_shared_from_this. However, it should give you an overview how to handle a shared pointer:
//!\brief Base clase for reference counter
class reference_base{
reference_base(const reference_base&); // not copyable
reference_base& operator=(const reference_base &){return *this;}// not assignable
protected:
size_t ref_count; //!< reference counter
virtual void dispose() = 0; //!< pure virtual
public:
//! initialize with a single reference count
reference_base() : ref_count(1){}
//! returns the current count of references
size_t use_count() const{
return ref_count;
}
//! increases the current count of references
void increase(){
ref_count++;
}
//! decreases the current count of references and dispose if the counter drops to zero
void decrease(){
if(--ref_count == 0)
dispose();
}
};
//! \brief Specialized version for pointer
template <class T>
class reference_base_ptr : public reference_base{
typedef T* pointer_type;
protected:
//! uses delete to deallocate memory
virtual void dispose(){
delete ptr;
ptr = 0;
}
public:
reference_base_ptr(T * ptr) : ptr(ptr){}
pointer_type ptr;
};
//! \brief Specialized version for arrays
template <class T>
class reference_base_range : public reference_base{
typedef T* pointer_type;
protected:
virtual void dispose(){
delete[] ptr;
ptr = 0;
}
public:
reference_base_range(T * ptr) : ptr(ptr){}
pointer_type ptr;
};
/***********************************************************/
//! base class for shared memory
template <class T, class reference_base_type>
class shared_memory{
public:
typedef T element_type;
//! Standard constructor, points to null
shared_memory() : reference_counter(new reference_base_type(0)){}
//! Constructs the shared_memroy and creates a new reference_base
template<class Y> shared_memory(Y * ptr){
try{
reference_counter = new reference_base_type(ptr);
}catch(std::bad_alloc &e){
delete ptr;
throw;
}
}
//! Copies the shared_memory and increases the reference count
shared_memory(const shared_memory & o) throw() : reference_counter(o.reference_counter){
o.reference_counter->increase();
}
//! Copies the shared_memory of another pointer type and increases the reference count.
//! Needs the same reference_base_type
template<class Y>
shared_memory(const shared_memory<Y,reference_base_type> & o) throw() : reference_counter(o.reference_counter){
reference_counter->increase();
}
//! Destroys the shared_memory object and deletes the reference_counter if this was the last
//! reference.
~shared_memory(){
reference_counter->decrease();
if(reference_counter->use_count() == 0)
delete reference_counter;
}
//! Returns the number of references
size_t use_count() const{
return reference_counter->use_count();
}
//! Returns a pointer to the refered memory
T * get() const{
return reference_counter->ptr;
}
//! Checks whether this object is unique
bool unique() const{
return use_count() == 1;
}
//! Checks whehter this object is valid
operator bool() const{
return get() != 0;
}
//! Checks doesn't reference anythign
bool empty() const{
return get() == 0;
}
//! Assignment operator for derived classes
template<class Y>
shared_memory& operator=(const shared_memory<Y,reference_base_type> & o){
shared_memory<Y,reference_base_type> tmp(o);
swap(tmp);
}
//! Assignment operator
shared_memory& operator=(const shared_memory & o){
shared_memory tmp(o);
swap(tmp);
return *this;
}
/** resets the ptr to NULL. If this was the last shared_memory object
* owning the referenced object, the object gets deleted.
* \sa ~shared_memory
*/
void reset(){
shared_memory tmp;
swap(tmp);
}
/** releases the old object and takes a new one
*/
template <class Y>
void reset(Y * ptr){
shared_memory tmp(ptr);
swap(tmp);
}
/** swaps the owned objects of two shared_memory objects.
*/
void swap(shared_memory & r){
reference_base_type * tmp = reference_counter;
reference_counter = r.reference_counter;
r.reference_counter = tmp;
}
protected:
reference_base_type * reference_counter; //!< Actually reference counter and raw pointer
};
/***********************************************************/
//! ptr (single object) specialization
template <class T>
class shared_ptr : public shared_memory<T,reference_base_ptr<T> >{
typedef reference_base_ptr<T> reference_counter_type;
typedef shared_memory<T,reference_counter_type> super;
typedef T element_type;
public:
shared_ptr(){}
template<class Y> shared_ptr(Y * ptr){
try{
super::reference_counter = new reference_counter_type(ptr);
}catch(std::bad_alloc &e){
//couldn't allocated memory for reference counter
delete ptr; // prevent memory leak
throw bad_alloc();
}
}
element_type & operator*() const{
return *(super::reference_counter->ptr);
}
element_type * operator->() const{
return super::reference_counter->ptr;
}
};
/***********************************************************/
//! array (range) specialization
template <class T>
class shared_array : public shared_memory<T,reference_base_range<T> >{
typedef reference_base_range<T> reference_counter_type;
typedef shared_memory<T,reference_counter_type> super;
typedef T element_type;
public:
shared_array(){}
template<class Y> shared_array(Y * ptr){
try{
super::reference_counter = new reference_counter_type(ptr);
}catch(std::bad_alloc &e){
delete[] ptr;
throw bad_alloc();
}
}
element_type & operator[](int i) const{
return *(super::reference_counter->ptr + i);
}
};
See also:
std::shared_ptr
boost::shared_ptr, especially the section Best Practices
It's the current implementation of boost::shared_ptr<X>, but others can be thought of which do not share this disadvantage. E.g. a static std::unordered_map<X*, int> std::shared_ptr<X>::share_count can be used to keep track of the amount of shared_ptr<X> pointers to each X. However, the downside of this is a far larger overhead than a simple share count.

How to create operator-> in iterator without a container?

template <class Enum>
class EnumIterator {
public:
const Enum* operator-> () const {
return &(Enum::OfInt(i)); // warning: taking address of temporary
}
const Enum operator* () const {
return Enum::OfInt(i); // There is no problem with this one!
}
private:
int i;
};
I get this warning above. Currently I'm using this hack:
template <class Enum>
class EnumIterator {
public:
const Enum* operator-> () {
tmp = Enum::OfInt(i);
return &tmp;
}
private:
int i;
Enum tmp;
};
But this is ugly because iterator serves as a missing container.
What is the proper way to iterate over range of values?
Update:
The iterator is specialized to a particular set objects which support named static constructor OfInt (code snippet updated).
Please do not nit-pick about the code I pasted, but just ask for clarification. I tried to extract a simple piece.
If you want to know T will be strong enum type (essentially an int packed into a class). There will be typedef EnumIterator < EnumX > Iterator; inside class EnumX.
Update 2:
consts added to indicate that members of strong enum class that will be accessed through -> do not change the returned temporary enum.
Updated the code with operator* which gives no problem.
Enum* operator-> () {
tmp = Enum::OfInt(i);
return &tmp;
}
The problem with this isn't that it's ugly, but that its not safe. What happens, for example in code like the following:
void f(EnumIterator it)
{
g(*it, *it);
}
Now g() ends up with two pointers, both of which point to the same internal temporary that was supposed to be an implementation detail of your iterator. If g() writes through one pointer, the other value changes, too. Ouch.
Your problem is, that this function is supposed to return a pointer, but you have no object to point to. No matter what, you will have to fix this.
I see two possibilities:
Since this thing seems to wrap an enum, and enumeration types have no members, that operator-> is useless anyway (it won't be instantiated unless called, and it cannot be called as this would result in a compile-time error) and can safely be omitted.
Store an object of the right type (something like Enum::enum_type) inside the iterator, and cast it to/from int only if you want to perform integer-like operations (e.g., increment) on it.
There are many kind of iterators.
On a vector for example, iterators are usually plain pointers:
template <class T>
class Iterator
{
public:
T* operator->() { return m_pointer; }
private:
T* m_pointer;
};
But this works because a vector is just an array, in fact.
On a doubly-linked list, it would be different, the list would be composed of nodes.
template <class T>
struct Node
{
Node* m_prev;
Node* m_next;
T m_value;
};
template <class T>
class Iterator
{
public:
T* operator->() { return m_node->m_value; }
private:
Node<T>* m_node;
};
Usually, you want you iterator to be as light as possible, because they are passed around by value, so a pointer into the underlying container makes sense.
You might want to add extra debugging capabilities:
possibility to invalidate the iterator
range checking possibility
container checking (ie, checking when comparing 2 iterators that they refer to the same container to begin with)
But those are niceties, and to begin with, this is a bit more complicated.
Note also Boost.Iterator which helps with the boiler-plate code.
EDIT: (update 1 and 2 grouped)
In your case, it's fine if your iterator is just an int, you don't need more. In fact for you strong enum you don't even need an iterator, you just need operator++ and operator-- :)
The point of having a reference to the container is usually to implement those ++ and -- operators. But from your element, just having an int (assuming it's large enough), and a way to get to the previous and next values is sufficient.
It would be easier though, if you had a static vector then you could simply reuse a vector iterator.
An iterator iterates on a specific container. The implementation depends on what kind of container it is. The pointer you return should point to a member of that container. You don't need to copy it, but you do need to keep track of what container you're iterating on, and where you're at (e.g. index for a vector) presumably initialized in the iterator's constructor. Or just use the STL.
What does OfInt return? It appears to be returning the wrong type in this case. It should be returning a T* instead it seems to be returning a T by value which you are then taking the address of. This may produce incorrect behavior since it will loose any update made through ->.
As there is no container I settled on merging iterator into my strong Enum.
I init raw int to -1 to support empty enums (limit == 0) and be able to use regular for loop with TryInc.
Here is the code:
template <uint limit>
class Enum {
public:
static const uint kLimit = limit;
Enum () : raw (-1) {
}
bool TryInc () {
if (raw+1 < kLimit) {
raw += 1;
return true;
}
return false;
}
uint GetRaw() const {
return raw;
}
void SetRaw (uint raw) {
this->raw = raw;
}
static Enum OfRaw (uint raw) {
return Enum (raw);
}
bool operator == (const Enum& other) const {
return this->raw == other.raw;
}
bool operator != (const Enum& other) const {
return this->raw != other.raw;
}
protected:
explicit Enum (uint raw) : raw (raw) {
}
private:
uint raw;
};
The usage:
class Color : public Enum <10> {
public:
static const Color red;
// constructors should be automatically forwarded ...
Color () : Enum<10> () {
}
private:
Color (uint raw) : Enum<10> (raw) {
}
};
const Color Color::red = Color(0);
int main() {
Color red = Color::red;
for (Color c; c.TryInc();) {
std::cout << c.GetRaw() << std::endl;
}
}

Is there C++ lazy pointer?

I need a shared_ptr like object, but which automatically creates a real object when I try to access its members.
For example, I have:
class Box
{
public:
unsigned int width;
unsigned int height;
Box(): width(50), height(100){}
};
std::vector< lazy<Box> > boxes;
boxes.resize(100);
// at this point boxes contain no any real Box object.
// But when I try to access box number 50, for example,
// it will be created.
std::cout << boxes[49].width;
// now vector contains one real box and 99 lazy boxes.
Is there some implementation, or I should to write my own?
It's very little effort to roll your own.
template<typename T>
class lazy {
public:
lazy() : child(0) {}
~lazy() { delete child; }
T &operator*() {
if (!child) child = new T;
return *child;
}
// might dereference NULL pointer if unset...
// but if this is const, what else can be done?
const T &operator*() const { return *child; }
T *operator->() { return &**this; }
const T *operator->() const { return &**this; }
private:
T *child;
};
// ...
cout << boxes[49]->width;
Using boost::optional, you can have such a thing:
// 100 lazy BigStuffs
std::vector< boost::optional<BigStuff> > v(100);
v[49] = some_big_stuff;
Will construct 100 lazy's and assign one real some_big_stuff to v[49]. boost::optional will use no heap memory, but use placement-new to create objects in a stack-allocated buffer. I would create a wrapper around boost::optional like this:
template<typename T>
struct LazyPtr {
T& operator*() { if(!opt) opt = T(); return *opt; }
T const& operator*() const { return *opt; }
T* operator->() { if(!opt) opt = T(); return &*opt; }
T const* operator->() const { return &*opt; }
private:
boost::optional<T> opt;
};
This now uses boost::optional for doing stuffs. It ought to support in-place construction like this one (example on op*):
T& operator*() { if(!opt) opt = boost::in_place(); return *opt; }
Which would not require any copy-ing. However, the current boost-manual does not include that assignment operator overload. The source does, however. I'm not sure whether this is just a defect in the manual or whether its documentation is intentionally left out. So i would use the safer way using a copy assignment using T().
I've never heard of such a thing, but then again there are lots of things I've never heard of. How would the "lazy pointer" put useful data into the instances of the underlying class?
Are you sure that a sparse matrix isn't what you're really looking for?
So far as I know, there's no existing implementation of this sort of thing. It wouldn't be hard to create one though.