Deleting an object via std::weak_ptr - c++

I have a std::vector with thousands of objects, stored as shared_ptr. Since the object has many attributes that can be used for searching, I maintain multiple indexes to this std::vector on std::map and std::multimap using weak_ptr.
std::vector<std::shared_ptr<Object>> Objects;
std::map<int,std::weak_ptr<Object>> IndexByEmployeeId;
std::multimap<std::string,std::weak_ptr<Object>> IndexByName;
Since the map and multimap are balanced binary trees, the search/modify are super fast. However, I am a bit foxed about delete. I want to delete after looking up the object via one of the indexes. Locking the weak_ptr gives me shared_ptr, but it doesn't allow me to destroy object on the vector. Is there any way to delete the original object on the vector?

It seems from what you've posted that your data structures are inappropriate for what you want to do.
shared_ptr should only be used to express shared ownership. However, your posted code and your desire to delete the objects pointed to indicate that Objects is in fact the sole owner of its objects.
Your vector<shared_ptr<object>> appears to be used only as data storage container (the desired functionality of searching by id or name is implemented elsewhere), but removing elements from a vector is typically expensive, so you may better use another type of container.
If there are no other shared_ptr to your objects, then your design is poor. In this case you shouldn't use any smart pointers but simply container<object> and maps into the that. For example something like this (not tested not even compiled):
struct data_t { std::string name; /* more data */ };
using objt_map = std::map<std::size_t,data_t>;
using object = objt_map::value_type; // pair<const size_t,data_t>
using obj_it = objt_map::iterator;
using name_map = std::multimap<std::string, obj_it>;
objt_map Objects;
name_map NameMap;
std::forward_list<obj_it> find_by_name(std::string const&name) const
{
auto range = NameMap.equal_range(name);
std::forward_list<obj_it> result;
for(auto it=range.first; it!=range.second; ++it)
result.push_front(it->second);
return result;
}
std::forward_list<obj_it> find_by_id(std::size_t id) const
{
auto it = Objects.find(name);
return {it == Objects.end()? 0:1, it};
}
void insert_object(std::size_t id, data_t const&data)
{
auto it = Objects.find(id);
if(it != Objects.end())
throw std::runtime_error("id '"+std::to_string(id)+"' already known");
Objects[id] = data;
NameMap.emplace(data.name, Objects.find(id));
}
void delete_object(obj_it it)
{
if(it==Objects.end())
throw std::runtime_error("attempt to delete invalid object");
auto range = NameMap.equal_range(it->second.name);
for(auto i=range.first; i!=range.second; ++i)
if(i->second==it) {
NameMap.erase(i);
break;
}
Objects.erase(it);
}
Note iterators of std::map remain valid upon insertion and deletion (of other objects), such that the returns from the finders are not invalidated by insertion and deletion. I use std::forward_list<obj_it> to return objects to allow for returning none or several.

This can be a use case where std::set is the appropriate choice over std::vector. std::set guarantees lookup, insertion and removal in logarithmic time. So you can lookup an object by index in one of your maps objects and then delete it in any container with log(N) performance.
I would suggest this approach if insertion/removal operations represent the performance bottleneck in your application.
By the way, consider carefully the actual need for shared_ptr, because shared_ptr comes with a certain performance overhead as opposed to unique_ptr for example. Your owner container could use unique_ptr and the various maps could simply use raw pointers.

So, here is another option, based on importing the objects by moving std::unique_ptr<>. Unfortunately, unique_ptrs are not useful keys for std::set (since they are unique), unless you have C++14, when set::find() can take another argument than a key (see below).
For a C++11 approach, one must therefore use std::map to store the unique_ptrs, which requires doubling the id and name entries: once in data_t and once as keys in the maps. Here is a sketch.
struct data_t {
const std::size_t id; // changing these would
const std::string name; // lead to confusion
/* more data */
};
using data_ptr = std::unique_ptr<data_t>;
using data_map = std::map<std::size_t,data_ptr>;
using obj_it = data_map::iterator;
using name_map = std::multimap<std::string,obj_it>;
data_map DataSet;
name_map NameMap;
std::vector<data_t*> find_by_name(std::string const&name) const
{
auto range = NameMap.equal_range(name);
std::vector<data_t*> result;
result.reserve(std::distance(range.first,range.second));
for(auto it=range.first; it!=range.second; ++it)
result.push_back(it->second->get());
return result;
}
data_t* find_by_id(std::size_t id) const
{
auto it = DataSet.find(id);
return it == DataSet.end()? nullptr : it->second.get();
}
// transfer ownership here
void insert_object(data_ptr&&ptr)
{
const auto id = ptr->id;
if(DataSet.count(id))
throw std::runtime_error("id '"+std::to_string(id)+"' already known");
auto itb = DataSet.emplace(id,std::move(ptr));
auto err = itb.second;
if(!err)
err = NameMap.emplace(itb.first->name,itb.first).second;
if(err)
throw std::runtime_error("couldn't insert id "+std::to_string(id));
}
// remove object given an observing pointer; invalidates ptr
void delete_object(data_t*ptr)
{
if(ptr==nullptr)
return; // issue warning or throw ?
auto it = DataSet.find(ptr->id);
if(it==DataSet.end())
throw std::runtime_error("attempt to delete an unknown object");
auto range = NameMap.equal_range(it->second->name);
for(auto i=range.first; i!=range.second; ++i)
if(i->second==it) {
NameMap.erase(i);
break;
}
DataSet.erase(it);
}
Here is a sketch for a C++14 solution, which avoids the duplication of the id and name data in the maps, but requires/assumes that data_t::id and data_t::name are invariable.
struct data_t {
const std::size_t id; // used as key in set & multiset:
const std::string name; // must never be changed
/* more data */
};
using data_ptr = std::unique_ptr<data_t>;
struct compare_id {
using is_transparent = std::size_t;
static bool operator(data_ptr const&l, data_ptr const&r)
{ return l->id < r->id; }
static bool operator(data_ptr const&l, std::size_t r)
{ return l->id < r; }
static bool operator(std::size_t l, data_ptr const&r)
{ return l < r->id; }
};
using data_set = std::set<data_ptr,compare_id>;
using data_it = data_set::const_iterator;
struct compare_name {
using is_transparent = std::string;
static bool operator(data_it l, data_it r)
{ return (*l)->name < (*r)->name; }
static bool operator(data_it l, std::string const&r)
{ return (*l)->name < r; }
static bool operator(std::string const&l, data_it r)
{ return l < (*r)->name; }
};
using name_set = std::multiset<data_it,compare_name>;
data_set DataSet;
name_set NameSet;
std::vector<data_t*> find_by_name(std::string const&name) const
{
auto range = NameSet.equal_range(name);
std::vector<data_t*> result;
result.reserve(std::distance(range.first,range.second));
for(auto it=range.first; it!=range.second; ++it)
result.push_back((*it)->get());
return result;
}
data_t* find_by_id(std::size_t id) const
{
auto it = DataSet.find(id);
return it == DataSet.end()? nullptr : it->get();
}
// transfer ownership here
void insert_object(data_ptr&&ptr)
{
const auto id = ptr->id;
if(DataSet.count(id))
throw std::runtime_error("id '"+std::to_string(id)+"' already known");
auto itb = DataSet.emplace(std::move(ptr));
auto err = itb.second;
if(!err)
err = NameSet.emplace(itb.first).second;
if(err)
throw std::runtime_error("couldn't insert id "+std::to_string(id));
}
// remove object given an observing pointer; invalidates ptr
void delete_object(data_t*ptr)
{
if(ptr==nullptr)
return; // issue warning or throw ?
auto it = DataSet.find(ptr->id);
if(it==DataSet.end())
throw std::runtime_error("attempt to delete an unknown object");
auto range = NameSet.equal_range(ptr->name);
for(auto i=range.first; i!=range.second; ++i)
if((*i)==it) {
NameSet.erase(i);
break;
}
DataSet.erase(it);
}
There may well be some bugs in here, in particular errors with dereferencing the various iterator and pointer types (though once it compiles these should be okay).

Related

How to pick proper sorted container?

I'm developing a real-time game client C++. And wondering how to pick container correctly.
Server sends new game objects (monsters, players), their changes and their removal. All of them have to be stored in a single container called World identified by unique <int>ID. So the most common operations are getObjByID() to change something and push_back(), remove(). Besides that client have to dynamically sort that World by objects fields like Distance or Type to pick an object for clients needs. That sorting can be made very often like every 10ms, and objects values can be dynamically changed by incoming server info like player moves and all other objects distance changes.
My first intention was to use std::vector without alloc\free - reinit deleted objects, but reading internet made me thinking about std::map. The main reason I doubt about map is that values cannot be sorted.
Is there a performant way to sort and filter std::vector or std::map without copying elements?
Something like c# linq:
var mons = world.Where(o=> o.isMonster()).OrderBy(o=> o.Distance);
foreach(var mon in mons){
//do smth
}
I recommend a different approach for two key reasons:
A single data structure is unlikely to satisfy your needs. Using an interlocked set of structures with one main index and multiple indices for specific query types will serve you better
Updating every entry when a single object moves is pretty wasteful. There is an entire set of spatial data structures that is designed to deal with looking up positions and finding objects in the vicinity. For my example, I'm using the R-Tree in Boost
Let's start with some basic type definitions. I assume 2D coordinates and use simple integers for object and type IDs. Adapt as necessary:
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/box.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <iterator>
// using std::back_inserter
#include <unordered_map>
#include <utility>
// using std::swap, std::pair
#include <vector>
namespace game {
using ObjectID = int;
using TypeID = int;
namespace bg = boost::geometry;
namespace bgi = boost::geometry::index;
using Point = bg::model::d2::point_xy<float, bg::cs::cartesian>;
using PointEntry = std::pair<Point, ObjectID>;
using RTree = bgi::rtree<PointEntry, bgi::quadratic<16> >;
You want to query specific types, e.g. only monsters. So we need to keep track of objects per type and their positions. The way we set up the R-Tree, mapping a Point to an ObjectID, even allows us to iterate over all objects of a specific type by just using the RTree.
class TypeState
{
public:
RTree positions;
void add(ObjectID id, Point position)
{ positions.insert(std::make_pair(position, id)); }
void erase(ObjectID id, Point position)
{ positions.remove(std::make_pair(position, id)); }
void move(ObjectID id, Point from, Point to)
{
positions.remove(std::make_pair(from, id));
positions.insert(std::make_pair(to, id));
}
RTree::const_iterator begin() const noexcept
{ return positions.begin(); }
RTree::const_iterator end() const noexcept
{ return positions.end(); }
};
Next, we define the state per object. This needs to be linked to the type so that deleting the object will remove it from the RTree. Since I plan to keep all types in an unordered_map and these guarantee that pointers are not invalidated when elements are added or removed, we can simply use that.
using TypeMap = std::unordered_map<TypeID, TypeState>;
using TypePointer = TypeMap::pointer;
class ObjectState
{
TypePointer type;
ObjectID id;
Point position;
public:
ObjectState() noexcept
: type(), id()
{}
ObjectState(TypePointer type, ObjectID id, Point position)
: type(type),
id(id),
position(position)
{ type->second.add(id, position); }
ObjectState(ObjectState&& o) noexcept
: type(o.type), id(o.id), position(o.position)
{ o.type = nullptr; }
ObjectState(const ObjectState&) = delete;
~ObjectState()
{
if(type)
type->second.erase(id, position);
}
void swap(ObjectState& o) noexcept
{
using std::swap;
swap(type, o.type);
swap(id, o.id);
swap(position, o.position);
}
ObjectState& operator=(ObjectState&& o) noexcept
{
ObjectState tmp = std::move(o);
swap(tmp);
return *this;
}
ObjectState& operator=(const ObjectState&) = delete;
TypeID get_type() const noexcept
{ return type->first; }
ObjectID get_id() const noexcept
{ return id; }
Point get_position() const noexcept
{ return position; }
/**
* Changes position
*
* Do not call this directly! Call WorldState::move
*/
void move(Point to)
{
type->second.move(id, position, to);
position = to;
}
};
Finally, we can put it all together. Since we may also want to query all objects regardless of type, we add a second R-tree for just that purpose.
This is also the place where we define our spatial queries. There are a lot of possibilities, such as K nearest neighbours, or all points within a range. See Predicates (boost::geometry::index::) There are also iterative queries that don't need temporary storage but I haven't used those for simplicity. Be careful about modifying data structures while queries are running.
class WorldState
{
using ObjectMap = std::unordered_map<ObjectID, ObjectState>;
TypeMap types;
ObjectMap objects;
RTree positions;
/*
* Warning: TypeMap must come before ObjectMap because ObjectState
* borrows pointers to TypeMap entries. Therefore destructor order matters
*/
public:
void add(TypeID type, ObjectID object, Point pos)
{
TypeMap::iterator typeentry = types.emplace(std::piecewise_construct,
std::forward_as_tuple(type),
std::forward_as_tuple()).first;
objects.emplace(std::piecewise_construct,
std::forward_as_tuple(object),
std::forward_as_tuple(&(*typeentry), object, pos));
positions.insert(std::make_pair(pos, object));
}
void move(ObjectID object, Point newpos)
{
ObjectState& objectstate = objects.at(object);
positions.remove(std::make_pair(objectstate.get_position(), object));
positions.insert(std::make_pair(newpos, object));
objectstate.move(newpos);
}
void erase(ObjectID object)
{
ObjectMap::iterator found = objects.find(object);
positions.remove(std::make_pair(found->second.get_position(), object));
objects.erase(found);
}
/**
* Calls functor for all objects
*
* Do not add or remove objects during the query!
*
* \param fun functor called with (ObjectID, const ObjectState&)
*/
template<class Functor>
void for_all_objects(Functor fun) const
{
for(ObjectMap::const_reference entry: objects)
fun(entry.first, entry.second);
}
/**
* Calls functor for all objects of given type
*
* \see for_all_objects
*/
template<class Functor>
void for_all_of_type(TypeID type, Functor fun) const
{
TypeMap::const_iterator foundtype = types.find(type);
if(foundtype == types.cend())
return;
for(const PointEntry& entry: foundtype->second)
fun(entry.second, objects.find(entry.second)->second);
}
/**
* Calls functor for the K nearest objects around the given object
*
* The object passed to the functor can be manipulated, removed, or other
* objects inserted during the functor call. But do not erase other
* objects!
*
* \param fun functor called with (ObjectID, ObjectState&)
*/
template<class Functor>
void for_k_around_object(
unsigned count, ObjectID object, Functor fun)
{
Point pos = objects.at(object).get_position();
std::vector<PointEntry> result_n;
positions.query(bgi::nearest(pos, count + 1),
std::back_inserter(result_n));
for(const PointEntry& entry: result_n) {
ObjectID found = entry.second;
if(entry.second != object) // exclude itself
fun(found, objects.find(found)->second);
}
}
/**
* K nearest objects of specific type around the given object
*
* \see for_k_around_object
*/
template<class Functor>
void for_k_of_type_around_object(
unsigned count, TypeID type, ObjectID object, Functor fun)
{
TypeMap::const_iterator foundtype = types.find(type);
if(foundtype == types.cend())
return;
const ObjectState& objectstate = objects.at(object);
if(objectstate.get_type() == type)
count += 1; // self will be returned by query
Point pos = objectstate.get_position();
std::vector<PointEntry> result_n;
foundtype->second.positions.query(
bgi::nearest(pos, count), std::back_inserter(result_n));
for(const PointEntry& entry: result_n) {
ObjectID found = entry.second;
if(entry.second != object) // exclude itself
fun(found, objects.find(found)->second);
}
}
};
} /* namespace game */

C++ cannot convert Student* to DYNVECTOR<Student,24>::Node* in initialization

Im trying to create an iterator for my dynamic vector class that i've implemented,
The thing is I keep getting this error when I try to initilaize some student class for testing
purposes on some assignment for the iterator, I cannot figure out why Im having this error wasted alot of hours on it and its impossible to figure out why this error occurs..
Here is the code
some edit
Here is the main function that im trying to use that takes the iterator from my class
some edit
Just in the intiliaze of the DYNVECTOR class in the main my code fails , I keep on getting
the error:
error: cannot convert 'Student*' to 'DYNVECTOR <Student, 24>::Node*' in initialization
Iter(T *N ) : _pointer(N) { }
EDIT: please guys focus on this part:
inputIterator begin() { return inputIterator(pa);}
this is what is causing the error, the functionality of the push back function and other funtions are still in progress but that is not the reason causing this error.
The Problem
inputIterator begin() { return inputIterator(pa);}
is calling inputIterator's constructor
Iter(T *N ) : _pointer(N) { }
with a pointer to a T, a T *. Iter takes T * happily, but it tries to store that T * into _pointer and _pointer is defined as
Node *_pointer;
which is NOT a T *. The assignment fails because the types don't match.
The Naive Solution
Make the types match. This means you have to pass in a Node *. Bad news: DYNARRAY doesn't have any Node *s to give it. Naive solution fails.
The Proper Solution
Throw out Node. Node is useful if you have a linked list. You don't have a linked list. Kill it. Make it dead. Clean up the mess.
class DYNVECTOR
{
// no nodes
class Iter // directly uses T pointers
{
public:
Iter(T *N) :
_pointer(N) // types match now
{
}
T& operator*() const
{
return *_pointer; // simpler without Node, no?
}
T* operator->() const
{
return _pointer; // simple
}
Iter& operator++()
{
_pointer++; // dead simple
return *this;
}
Iter operator++(int)
{
Iter tmp = *this;
_pointer++; // yawn-city
return tmp;
}
bool operator==(Iter const &rhs) const
{
return _pointer == rhs._pointer; // unchanged
}
bool operator!=(Iter const &rhs) const
{
return _pointer != rhs._pointer; // unchanged
}
private:
T *_pointer; // T *, not Node *
};
private:
size_t someCap, length; //, initCap; don't see the point of initCap
T *pa; // unchanged
public:
typedef Iter inputIterator;
DYNVECTOR():
someCap(Capacity), // Still not sure what Capacity is for, so I used
// it here instead of magic number 24
length(0),
pa(new T[someCap])
{
// used initializer list instead.
}
inputIterator begin()
{
return inputIterator(pa); // unchanged
}
inputIterator end()
{
return inputIterator(&pa[length]); // iterator to one past the end.
// just like std::vector
}
template<class Iter>
DYNVECTOR(const Iter &begin, const Iter &end): // far more versatile if const references
DYNVECTOR() // delegate basic set-up to default constructor
{
for (Iter pointer = begin; pointer != end; pointer++) // loop unchanged
{
push_back(*pointer);
}
}
// make uncopyable (for now anyway) See Rule of Three
// linked below for why
DYNVECTOR(const DYNVECTOR & ) = delete;
DYNVECTOR& operator=(const DYNVECTOR & ) = delete;
~DYNVECTOR() // for my own testing. left as example
{
delete[] pa; // clean up allocated storage
}
void push_back(const T & newb) // for my own testing. left as example
{
if (length == someCap) // need more space
{
int newCap = someCap * 2; // double the size
// you might need to do something different like
// int newCap = someCap + Capacity;
// There's no way for me to know.
// The rest should be right though.
T* newArr = new T[newCap]; // allocate bigger array
for (size_t index = 0; index < length; index++)
{ // copy old array into new
newArr[index] = pa[index];
}
delete[] pa; // discard old array
pa = newArr; // use new array
someCap = newCap; // update capacity
}
pa[length] = newb; // insert new item
length++; // update counter
}
};
Documentation on the Rule of Three and friends. You cannot write complex and efficient C++ unless you understand these rules. Learn them or consign yourself to being a hack.
First of all when you deal with objects it is bad practice to do:
DYNVECTOR<Student, 24> students;
It should be:
DYNVECTOR<Student*, 24> students;
Secondly you never created a constructor for your DYNVECTOR how do you expect the object to be created??

looking for small associated array where key is automatic generated and has linear memory consumption

Like the title says, I am looking for an associated array (like a map) with linear memory consumption (like a std::vector) where the keys are automatic generated for new entries. N < 128.
For example we would use this for an observer pattern, were you can register callbacks to a value change event. In return, you get an id (integer). With this id you can later unregister your callback.
Pseude code:
/// Registers a callback and returns an associated id to it.
int register_callback(std::function callback);
/// Returns true if callback was unregistered for given id.
bool unregister_callback(int id);
Since this should be used inside embedded devices with memory restriction, I wouldn't use a map as container (see here: http://jsteemann.github.io/blog/2016/06/14/how-much-memory-does-an-stl-container-use/, map uses ~5 times more memory than a vector).
I have some ideas how to implement this myself, but I wonder if there exists any name/concept for this already?
This would be my idea:
template<typename T>
class custom_map { // Totally unsure with naming
// coll_ is sorted.
std::vector<std::pair<uint8_t, T>> coll_;
public:
uint8_t add(T) {
// Find unused id by iterating over coll_,
// If ((prev id + 1) != (next id)), free id found.
// Insert into coll new pair and sort.
}
bool remove(uint8_t id) {
// Remove element in coll with associated id
// Binary search would be faster but only for bigger sizes, so linear search?
}
// Iterator concept begin/end... to iterate over collection.
};
You dont need to hold seperate id in your object. You can use index of vector as id. That would be much faster.
template<typename T>
class custom_map { // Totally unsure with naming
std::vector<T*> coll_;
public:
uint8_t add(T*obj) {
// Find unused id by iterating over coll_,
for (int i = 0; i < coll_.size(); ++i) {
if (coll_[i] == nullptr) {
coll_[i] = obj;
return i;
}
}
coll_.push_back(obj);
return coll_.size() - 1;
}
bool remove(uint8_t id) {
coll_[id] = nullptr;
}
// Iterator concept begin/end... to iterate over collection.
};
You're possibly over complicating the problem, you can simply use the index into the vector as your key:
#include <vector>
#include <cstdint>
#include <functional>
#include <iostream>
template<typename T>
class custom_map {
std::vector<T> coll_;
public:
size_t add(const T& t) {
for (auto it = coll_.begin(); it != coll_.end(); ++it)
{
if (!(*it))
{
*it = t;
return it - coll_.begin();
}
}
coll_.push_back(t);
return coll_.size() - 1;
}
bool remove(size_t id) {
if (id >= coll_.size() || !coll_[id]) {
return false;
}
coll_[id] = {};
// remove empty elements from the end of the list
if (id == coll_.size()-1) {
auto toRemove = std::count_if(coll_.rbegin(), coll_.rend(), [](const T& t) { return !t; });
coll_.resize(coll_.size() - toRemove);
}
return true;
}
};
int main()
{
custom_map<std::function<void()>> map;
auto i = map.add([](){std::cout << "1\n"; });
map.remove(i);
}
This will only work with types that are default initialised to a value convertible to false (like std::function), for other types you can just wrap the type in std::optional, for example (the map will actually work with int too but 0 being an empty value might not be what you want):
int main()
{
custom_map<std::optional<int>> map;
auto i = map.add(10);
map.remove(i);
}

C++ random access iterators for containers with elements loaded on demand

I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.
Therefore we decided to implement a FileReader class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines
SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m); // Try deserializing as SpecificMessage
The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way
for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
// ...
}
Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade to simplify the implementation.
Now my question boils down to: how to implement the iterator correctly? Since FileReader does not actually hold a sequence of messages internally, but loads them on request.
What we've tried so far:
Storing the message as an iterator member
This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator has a dereference operation that looks like this
reference operator*() const
{ // return designated value
_RanIt _Tmp = current;
return (*--_Tmp);
}
This breaks our approach as a reference to a message from a temporary iterator is returned.
Making the reference type equal the value type
#DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as
pointer operator->() const {
return (&**this);
}
which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.
Storing the message externally
Alternatively I though about storing the message externally:
SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...
which also seems to be flawed for
auto iter2 = iter + 2
which will have both iter2 and iter point to the same content.
As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:
As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?
Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator to do the deserialization on demand.
Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>, e.g.), dynamic-size containers (e.g. vector<>) in addition to the fixed-size containers (array<>) - and any combinations of those.
Here's a demo that takes a simple SpecificMessage that contains a string, and (de)derializes it directly into shared memory:
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
The part that interests you would be the consuming part:
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
So this prints each 13th message, in reverse order, followed by a random blob.
Full Demo
The sample online uses the lines of the sources as "messages".
Live On Coliru
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>
#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>
static char const* DBASE_FNAME = "database.map";
namespace bip = boost::interprocess;
namespace shm {
using segment_manager = bip::managed_mapped_file::segment_manager;
template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
template <typename T> using vector = bip::vector<T, allocator<T> >;
}
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
struct SpecificMessage {
// for demonstration purposes, just a string; could be anything serialized
std::string contents;
// trivial save/load serialization code:
template <typename Blob>
friend bool save(Blob& blob, SpecificMessage const& msg) {
blob.assign(msg.contents.begin(), msg.contents.end());
return true;
}
template <typename Blob>
friend bool load(Blob const& blob, SpecificMessage& msg) {
msg.contents.assign(blob.begin(), blob.end());
return true;
}
};
template <typename Message> struct LazyLoader {
using type = Message;
Message operator()(blob_t const& blob) const {
Message result;
if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
return result;
}
};
///////
// for demo, create some database contents
void create_database_file() {
bip::file_mapping::remove(DBASE_FNAME);
bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
std::ifstream ifs("main.cpp");
std::string line;
while (std::getline(ifs, line)) {
table->emplace_back();
save(table->back(), SpecificMessage { line });
}
std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}
///////
void display_random_messages() {
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}
int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif
srand(time(NULL));
display_random_messages();
}
You are having issues because your iterator does not conform to the forward iterator requirements. Specifically:
*i must be an lvalue reference to value_type or const value_type ([forward.iterators]/1.3)
*i cannot be a reference to an object stored in the iterator itself, due to the requirement that two iterators are equal if and only if they are bound to the same object ([forward.iterators]/6)
Yes, these requirements are a huge pain in the butt, and yes, that means that things like std::vector<bool>::iterator are not random access iterators even though some standard library implementations incorrectly claim that they are.
EDIT: The following suggested solution is horribly broken, in that dereferencing a temporary iterator returns a reference to an object that may not live until the reference is used. For example, after auto& foo = *(i + 1); the object referenced by foo may have been released. The implementation of reverse_iterator referenced in the OP will cause the same problem.
I'd suggest that you split your design into two classes: FileCache that holds the file resources and a cache of loaded messages, and FileCache::iterator that holds a message number and lazily retrieves it from the FileCache when dereferenced. The implementation could be something as simple as storing a container of weak_ptr<Message> in FileCache and a shared_ptr<Message> in the iterator: Simple demo
I have to admit I may not fully understand the trouble you have with holding the current MESSAGE as a member of Iter. I would associate each iterator with the FileReader it should read from and implement it as a lightweight encapsulation of a read index for FileReader::(read|moveTo). The most important method to overwtite is boost::iterator_facade<...>::advance(...) which modifies the current index and tries to pull a new MESSAGE from the FileReader If this fails it flags the the iterator as invalid and dereferencing will fail.
template<class MESSAGE,int STEP>
class message_iterator;
template<class MESSAGE>
class FileReader {
public:
typedef message_iterator<MESSAGE, 1> const_iterator;
typedef message_iterator<MESSAGE,-1> const_reverse_iterator;
FileReader();
bool open(const std::string & rName);
bool moveTo(int n);
bool read(MESSAGE &m);
// get the total count of messages in the file
// helps us to find end() and rbegin()
int getMessageCount();
const_iterator begin() {
return const_iterator(this,0);
}
const_iterator end() {
return const_iterator(this,getMessageCount());
}
const_reverse_iterator rbegin() {
return const_reverse_iterator(this,getMessageCount()-1);
}
const_reverse_iterator rend() {
return const_reverse_iterator(this,-1);
}
};
// declaration of message_iterator moving over MESSAGE
// STEP is used to specify STEP size and direction (e.g -1 == reverse)
template<class MESSAGE,int STEP=1>
class message_iterator
: public boost::iterator_facade<
message_iterator<MESSAGE>
, const MESSAGE
, boost::random_access_traversal_tag
>
{
typedef boost::iterator_facade<
message_iterator<MESSAGE>
, const MESSAGE
, boost::random_access_traversal_tag
> super;
public:
// constructor associates an iterator with its FileReader and a given position
explicit message_iterator(FileReader<MESSAGE> * p=NULL,int n=0): _filereader(p),_idx(n),_valid(false) {
advance(0);
}
bool equal(const message_iterator & i) const {
return i._filereader == _filereader && i._idx == _idx;
}
void increment() {
advance(+1);
}
void decrement() {
advance(-1);
}
// overwrite with central functionality. Move to a given relative
// postion and check wether the position can be read. If move/read
// fails we flag the iterator as incalid.
void advance(int n) {
_idx += n*STEP;
if(_filereader!=NULL) {
if( _filereader->moveTo( _idx ) && _filereader->read(_m)) {
_valid = true;
return;
}
}
_valid = false;
}
// Return a ref to the currently cached MESSAGE. Throw
// an acception if positioning at this location in advance(...) failes.
typename super::reference dereference() const {
if(!_valid) {
throw std::runtime_error("access to invalid pos");
}
return _m;
}
private:
FileReader<MESSAGE> * _filereader;
int _idx;
bool _valid;
MESSAGE _m;
};
Boost PropertyMap
You could avoid writing the bulk of the code using Boost PropertyMap:
Live On Coliru
#include <boost/property_map/property_map.hpp>
#include <boost/property_map/function_property_map.hpp>
using namespace boost;
struct SpecificMessage {
// add some data
int index; // just for demo
};
template <typename Message>
struct MyLazyReader {
typedef Message type;
std::string fname;
MyLazyReader(std::string fname) : fname(fname) {}
Message operator()(size_t index) const {
Message m;
// FileReader fr;
// fr.open(fname);
// fr.moveTo(index); // Move to Message
// fr.read(&m); // Try deserializing as SpecificMessage
m.index = index; // just for demo
return m;
}
};
#include <iostream>
int main() {
auto lazy_access = make_function_property_map<size_t>(MyLazyReader<SpecificMessage>("file.bin"));
for (int i=0; i<10; ++i)
std::cout << lazy_access[rand()%256].index << "\n";
}
Sample output is
103
198
105
115
81
255
74
236
41
205
Using Memory Mapped Files
You could store a map of index -> BLOB objects in a shared vector<array<byte, N>>, flat_map<size_t, std::vector<uint8_t> > or similar.
So, now you only have to deserialize from myshared_map[index].data() (begin() and end() in case the BLOB size varies)

c++ container allowing you to sort items by when they where last accessed?

Does such a thing exist? or could anyone please recommend how I could implement such a container?
basically I have a std::map which uses a 64bit integer as its key and a custom datatype as the containing item.
I need to be able to periodically remove items that havent been accessed in a while in the most optimal way. does anyone have any suggestions for this?
cheers
Use a priority queue that places the least-recently-used (LRU) item at the head of the queue. When an item is accessed, remove it and re-insert it against the current timestamp. When you want to expire items, simply remove them from the head of the queue.
I should point out that you can't use the standard priority_queue, since that doesn't support random removal. You'll have to use the heap functions in conjunction with a vector.
I should also point out that updating an item on access will be expensive (O(N) to find the element to remove).
EDIT: Please disregard this answer. On rethinking, it isn't the best way to do it. (Also, see comments.)
Here is a sketch of how it might be done, using a list to store the most recently accessed items in order. The list is updated in constant time, so there is no significant overhead above the map access (unlike some other answers which require a linear search on each access). I've kept the interface very basic, and haven't tested it very thoroughly.
template <typename KEY, typename VALUE>
class Container
{
public:
void Set(const KEY& key, const VALUE& value)
{
typename Map::iterator it = map.find(key);
if (it == map.end())
{
list.push_front(it);
it = map.insert(std::make_pair(key, std::make_pair(value, list.begin()))).first;
list.front() = it;
}
else
{
it->second.first = value;
Accessed(it);
}
}
const VALUE* Get(const KEY& key)
{
typename Map::iterator it = map.find(key);
if (it == map.end())
return 0;
Accessed(it);
return &it->second.first;
}
void Expire(std::size_t new_size)
{
while (list.size() > new_size)
{
map.erase(list.back());
list.pop_back();
}
}
private:
// Needed to resolve the semicircular dependency on nested iterator types.
struct MapIterator;
typedef std::list<MapIterator> List;
typedef std::map<KEY, std::pair<VALUE, typename List::iterator> > Map;
struct MapIterator : Map::iterator
{
MapIterator(const typename Map::iterator& it) : Map::iterator(it) {}
};
void Accessed(typename Map::iterator it)
{
list.erase(it->second.second);
list.push_front(it);
it->second.second = list.begin();
}
Map map;
List list;
};
One idea: maintain a std::deque which gets an iterator into your map element pushed to the front whenever accessing the map. You can then easily look at the deque to tell which elements have been used most recently.
Some C++ sketch (void of error checking, the point is to demonstrate that the deque is updated when accessing the map, and you can lateron trim the map).
class MyMap {
typedef std::map<int64_t, void *> Map;
Map m_map;
std::deque<Map::iterator> m_recentlyUsedItems;
public:
void *getItem( int64_t key ) {
Map::iterator it = m_map.find( key );
if ( it == m_map.end() ) {
return 0;
}
m_recentlyUsedItems.push_front( it );
return it->second;
}
void removeAllButMostRecentlyUsedItems( int n ) {
std::deque<Map::iterator> it = m_recentlyUsedItems.begin();
advance( it, n );
std::deque<Map::iterator> it2 = it;
for ( ; it2 != m_recentlyUsedItems.end(); ++it2 ) {
m_map.erase( *it2 );
}
m_recentlyUsedItems.erase( it, m_recentlyUsedItems.end() );
}
};
I'm going to propose a singularly different idea.
The problem of optimal is that it's difficult to get its sense. Especially: do you wish to cripple the retrieve operation in order to get a faster cleanup ? Typical cleanup are usually done during "down time" when speed isn't that important, on the other hand you might want snappy retrieves (for access in loops etc...)
There I would propose that before you try fancy constructs, you simply store the last accessed time along your item in the map. Then, cleanup simply consists in checking each item in the map and removing those you don't want any longer.
If you just need to know which elements were accessed so that you can delete them, then you probably could take a multi index map and store the last access value as alternative key.
If you want to use that idea to increase performance, you could implement your own container. The easiest approach would mean making a data structure that is known as auto-sorting list. Actually, this means that every access operation makes the accessed element of your list it's new head. In this case, elements that are accessed frequently would reside close to the beginning, resulting in a better search time.
Of course, there are variations. Automatically sorted lists aren't very efficient and there are many other similiar data structures that are actually better.
I have implemented a similar sounding type I called DynamicCache. Basically it stores the data in a list sorted by the creation date. This could easily be changed to the last accessed date. The purpose of my cache is to cache database items that don't change very often. It caches items for 5 minutes then removes them to be read again when they are next accessed.
The cache lookup uses a map that stores the key and an iterator to the list. The lookup uses the map to find the data in the list, then before the item is returned removes all the old items from the end of the list. If an item is not in the cache a factory is called to provided the data.
This approach must use a list to store the data as the iterators in the map must always be valid, if it used a deque the iterators could be invalidated after an insert. The list uses a struct to store the data, the key, the time it was created (not last accessed) and finally if the data exists.
struct Record
{
KeyT key;
DataT data;
time_t createTime;
bool exists;
};
If your data is static and you want to preserve the most recently accessed then you could add an access time member to the struct, and move the item to the top of the list each time it is accessed.
Here is my code, it looks a little complicated but this is mainly caused by template parameters and a reader writer lock.
#include "BWThread/BWReadersWriterLock.h"
#include <list>
#include <map>
#include <ctime>
#include <memory>
#include <boost/scoped_ptr.hpp>
/**
* This is a Generic Cache implementation.
*
* To implement a cache using this class create a new class
* derived from CacheFactory and implement the lookup method.
* If the datasource supports updating implement update and
* remove method.
*
* EG
* typedef NameCache DynamicCache<int, BWString>;
* NameCacheFactory : NameCache::Factory
* {
* public:
* virtual bool lookup(int, BWString *);
* };
*
* NameCache cache(new NameCacheFactory, "<cache name>" );
*
* --------------------------------------------------------
* Implementation note:
* This class uses a list as an efficient way to remove stale items from
* the cache. The map stores a key and an iterators to the data in the list.
* The list and the map are updated together.
*/
template <class KeyT, class DataT>
class CacheFactory
{
public:
virtual ~CacheFactory() {}
// Lookup the data for from the data source.
// Return true if the data is found.
virtual bool lookup(const KeyT & key, DataT * data) = 0;
// Update or insert the data in the data source.
// Return true if the data can be updated.
// Returning false means the cache is not updated either.
virtual bool update(const KeyT & key, const DataT & data) { return false; }
// Remove the data in the data source.
// Return true if the data can be deleted weather it exists or not.
// Returning false means the cache is not updated either.
virtual bool remove(const KeyT & key) { return false; }
};
template <class KeyT, class DataT>
class DynamicCache
{
public:
typedef CacheFactory<KeyT, DataT> Factory;
DynamicCache(Factory * f, const std::string & name, time_t t = (5 * 60)) :
factory(f), timeout(t), lastClean(std::time(0)), lock(name + " DynamicCache") {}
/*
* Lookup a key in the cache, the cached version is returned if it is
* present and the value is not old. If the value is old or is not
* present then use the factory to create it and insert the value in the
* cache for future lookups. If the factory cannot create it cache this
* fact too so we will ignore future lookups. Afterwards any entries in
* the cache longer than timeout are removed.
*
* This is the main method and entry point for the cache. All locking is
* performed inside the child methods.
*/
bool lookup(const KeyT & key, DataT * data, time_t now = std::time(0))
{
bool found = false;
FindStatus status = find(key, data, now);
switch(status & EntryStatus) {
case Found:
found = true;
break;
case Create:
found = build(key, data, now);
break;
}
if (status & CleanRequired) {
cleanOldEntries(now);
}
return found;
}
bool update(const KeyT & key, const DataT & data, time_t now = std::time(0))
{
if (factory->update(key, data))
{
Record record;
record.key = key;
record.createTime = now;
record.data = data;
record.exists = true;
BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);
updateEntry(key, record);
return true;
}
return false;
}
bool remove(const KeyT & key, time_t now = std::time(0))
{
if (factory->remove(key))
{
Record record;
record.key = key;
record.createTime = now;
record.exists = false;
BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);
updateEntry(key, record);
return true;
}
return false;
}
/**
* Return the size of the cache (only really useful for unit testing).
*/
size_t size() const
{
BWReadersWriterLock::ReadLockGuard guard(lock, __FILE__, __LINE__);
return map.size();
}
Factory * getFactory()
{
return factory.get();
}
private:
// Cache record
struct Record
{
KeyT key;
DataT data;
time_t createTime;
bool exists;
};
// Find and Clean status
// CleanRequired is part of this so that searching the cache and finding
// stale items in the cache can be automic (use a single readlock).
enum FindStatus {
None,
Found,
Create, //Add
NotExist,
EntryStatus=Found|Create|NotExist,
CleanRequired = 8
};
typedef std::list<Record> List;
typedef typename List::iterator Iterator;
typedef std::map<KeyT, typename Iterator> Map;
//
// The following methods all use and require explicit locking.
//
FindStatus find(const KeyT & key, DataT * data, time_t now)
{
BWReadersWriterLock::ReadLockGuard guard(lock, __FILE__, __LINE__);
Iterator itr = getEntry(key);
if (isValid(itr) && !isOld(itr, now)) {
if (itr->exists) {
*data = itr->data;
return FindStatus(Found | cleanRequired(now));
}
else {
return FindStatus(NotExist | cleanRequired(now));
}
}
return FindStatus(Create | cleanRequired(now));
}
bool build(const KeyT & key, DataT * data, time_t now)
{
Record record;
record.key = key;
record.createTime = now;
record.exists = factory->lookup(key, &record.data);
BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);
if (record.exists) {
*data = record.data;
}
updateEntry(key, record);
return record.exists;
}
void cleanOldEntries(time_t now)
{
BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);
lastClean = now;
time_t old = now - timeout;
typename List::reverse_iterator itr = list.rbegin();
while(!list.empty() && list.back().createTime < old) {
removeEntry(getEntry(list.back().key));
}
}
//
// The following methods don't use locking but require the calling
// method to already have aquired a lock.
//
Iterator getEntry(const KeyT & key)
{
typename Map::const_iterator itr = map.find(key);
if (itr != map.end()) {
return map.find(key)->second;
}
return list.end();
}
bool updateEntry(const KeyT key, const Record & record)
{
Iterator itr = getEntry(key);
if (isValid(itr)) {
removeEntry(itr);
}
insertEntry(record);
return record.exists;
}
bool isValid(Iterator itr) const
{
typename List::const_iterator constItr(itr);
return constItr != list.end();
}
bool isOld(Iterator itr, time_t now) const
{
// isOld or time_t has wrapped
return ((itr->createTime + timeout) < now) || (now < itr->createTime);
}
Iterator insertEntry(const Record & record)
{
list.push_front(record);
Iterator itr = list.begin();
map.insert(typename Map::value_type(record.key, itr));
return itr;
}
void removeEntry(Iterator itr)
{
map.erase(itr->key);
list.erase(itr);
}
FindStatus cleanRequired(time_t now) const
{
return (lastClean + timeout) < now ? CleanRequired : None;
}
List list;
Map map;
time_t timeout;
time_t lastClean;
boost::scoped_ptr<CacheFactory<KeyT, DataT> > factory;
mutable BWReadersWriterLock lock;
};
You can also use linked_hash_map from MCT library. In fact, its documentation contains a recipe for this usecase.