I wonder is it possible to have a map that would work like boost circular buffer. Meaning it would have limited size and when it would get to its limited size it will start overwriting first inserted elements. Also I want to be capable to search thru such buffer and find or create with [name]. Is It possible to create such thing and how to do it?
What you want is an LRU (least recently used) Map, or LRA (least recently added) Map depending on your needs.
Implementations already exist.
Well, I don't think that structure is present out of the box in boost (may exist elsewhere, though), so you should create it. I wouldn't recommend using operator[](), though, at least as it is implemented in std::map, because this may make difficult to track elements added to the map (for exapmle, using operator[]() with a value adds that empty value to the map), and go for a more explicit get and put operations for adding and retrieving elements of the map.
As for the easiest implementation, I would go for using an actual map as the storage, and a deque for the storage of the elements added (not tested):
template <typename K, typename V>
struct BoundedSpaceMap
{
typedef std::map<K,V> map_t;
typedef std::deque<K> deque_t;
// ...
typedef value_type map_t::value_type;
// Reuse map's iterators
typedef iterator map_t::iterator;
// ...
iterator begin() { return map_.begin(); }
// put
void put ( K k, V v)
{ map_.insert(std::make_pair(k,v));
deque_.push_back(k);
_ensure(); // ensure the size of the map, and remove the last element
}
// ...
private:
map_t map_;
deque_t deque_;
void _ensure() {
if (deque_size() > LIMIT) {
map_.erase(deque_.front()); deque_.pop_front();
}
}
};
Well not really a "circular buffer" since that doesn't make much sense for a map, but we can use a simple array without any additional linked lists or anything.
This is called closed hashing - the wiki article summarizes it quite nicely. Double hashing is the most often used as it avoids clustering (which leads to worse performance), but has its own problems (locality).
Edit: Since you want a specific implementation, I don't think boost has one but this or this were mentioned in another SO post about closed hashing..
Related
Suppose I own a list of edges saved inside a vector like:
typedef struct edge
{
int v;
size_t start;
size_t end;
}e;
typedef vector<list<e>> adj_list;
adj_list tree;
I have to do logic on this tree object, but the logic is too complicated to do it in place (constricted to not recurse). I need an extra data structure to handle each node. As a simple example, lets consider incrementing each edge's v value:
list<e> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = aux.front();
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
The problem in doing this is that the changes made to the now variable does not reflect the value in tree. I need a list(can be any list(vector,linked,queue,stack) that has an empty() boolean like Dijkstra) ds to handle my edge objects in tree. Is there an elegant way to do this? Can I use a list of iterators? I'm specifically asking an "elegant" approach in hopes that it does not involve pointers.
As discussed in the comments, the solution is to store iterators instead of copies, e.g.:
list<list<e>::iterator> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = *(aux.front());
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
This works only if you can guarantee that nothing will invalidate the stored iterators, such as certain operations on tree could do.
As pointed out by n. 'pronouns' m., iterators can be considered as "generalized pointers", so many problems that regular pointers have also apply to iterators.
Another (slightly safer) approach would be to store std::shared_ptrs in the inner list of tree - then you can simply store another std::shared_ptr to the same object in aux which makes sure that the object cannot be accidentally deleted while it is still being referenced
I read through some posts and "wikis" but still cannot decide what approach is suitable for my problem.
I create a class called Sample which contains a certain number of compounds (lets say this is another class Nuclide) at a certain relative quantity (double).
Thus, something like (pseudo):
class Sample {
map<Nuclide, double>;
}
If I had the nuclides Ba-133, Co-60 and Cs-137 in the sample, I would have to use exactly those names in code to access those nuclides in the map. However, the only thing I need to do, is to iterate through the map to perform calculations (which nuclides they are is of no interest), thus, I will use a for- loop. I want to iterate without paying any attention to the key-names, thus, I would need to use an iterator for the map, am I right?
An alternative would be a vector<pair<Nuclide, double> >
class Sample {
vector<pair<Nuclide, double> >;
}
or simply two independent vectors
Class Sample {
vector<Nuclide>;
vector<double>;
}
while in the last option the link between a nuclide and its quantity would be "meta-information", given by the position in the respective vector only.
Due to my lack of profound experience, I'd ask kindly for suggestions of what approach to choose. I want to have the iteration through all available compounds to be fast and easy and at the same time keep the logical structure of the corresponding keys and values.
PS.: It's possible that the number of compunds in a sample is very low (1 to 5)!
PPS.: Could the last option be modified by some const statements to prevent changes and thus keep the correct order?
If iteration needs to be fast, you don't want std::map<...>: its iteration is a tree-walk which quickly gets bad. std::map<...> is really only reasonable if you have many mutations to the sequence and you need the sequence ordered by the key. If you have mutations but you don't care about the order std::unordered_map<...> is generally a better alternative. Both kinds of maps assume you are looking things up by key, though. From your description I don't really see that to be the case.
std::vector<...> is fast to iterated. It isn't ideal for look-ups, though. If you keep it ordered you can use std::lower_bound() to do a std::map<...>-like look-up (i.e., the complexity is also O(log n)) but the effort of keeping it sorted may make that option too expensive. However, it is an ideal container for keeping a bunch objects together which are iterated.
Whether you want one std::vector<std::pair<...>> or rather two std::vector<...>s depends on your what how the elements are accessed: if both parts of an element are bound to be accessed together, you want a std::vector<std::pair<...>> as that keeps data which is accessed together. On the other hand, if you normally only access one of the two components, using two separate std::vector<...>s will make the iteration faster as more iteration elements fit into a cache-line, especially if they are reasonably small like doubles.
In any case, I'd recommend to not expose the external structure to the outside world and rather provide an interface which lets you change the underlying representation later. That is, to achieve maximum flexibility you don't want to bake the representation into all your code. For example, if you use accessor function objects (property maps in terms of BGL or projections in terms of Eric Niebler's Range Proposal) to access the elements based on an iterator, rather than accessing the elements you can change the internal layout without having to touch any of the algorithms (you'll need to recompile the code, though):
// version using std::vector<std::pair<Nuclide, double> >
// - it would just use std::vector<std::pair<Nuclide, double>::iterator as iterator
auto nuclide_projection = [](Sample::key& key) -> Nuclide& {
return key.first;
}
auto value_projecton = [](Sample::key& key) -> double {
return key.second;
}
// version using two std::vectors:
// - it would use an iterator interface to an integer, yielding a std::size_t for *it
struct nuclide_projector {
std::vector<Nuclide>& nuclides;
auto operator()(std::size_t index) -> Nuclide& { return nuclides[index]; }
};
constexpr nuclide_projector nuclide_projection;
struct value_projector {
std::vector<double>& values;
auto operator()(std::size_t index) -> double& { return values[index]; }
};
constexpr value_projector value_projection;
With one pair these in-place, for example an algorithm simply running over them and printing them could look like this:
template <typename Iterator>
void print(std::ostream& out, Iterator begin, Iterator end) {
for (; begin != end; ++begin) {
out << "nuclide=" << nuclide_projection(*begin) << ' '
<< "value=" << value_projection(*begin) << '\n';
}
}
Both representations are entirely different but the algorithm accessing them is entirely independent. This way it is also easy to try different representations: only the representation and the glue to the algorithms accessing it need to be changed.
Suppose you have a std::vector<std::map<std::string, T> >. You know that all the maps have the same keys. They might have been initialized with
typedef std::map<std::string, int> MapType;
std::vector<MapType> v;
const int n = 1000000;
v.reserve(n);
for (int i=0;i<n;i++)
{
std::map<std::string, int> m;
m["abc"] = rand();
m["efg"] = rand();
m["hij"] = rand();
v.push_back(m);
}
Given a key (e.g. "efg"), I would like to extract all values of the maps for the given key (which definitely exists in every map).
Is it possible to speed up the following code?
std::vector<int> efgValues;
efgValues.reserve(v.size());
BOOST_FOREACH(MapType const& m, v)
{
efgValues.push_back(m.find("efg")->second);
}
Note that the values are not necessarily int. As profiling confirms that most time is spent in the find function, I was thinking about whether there is a (GCC and MSVC compliant C++03) way to avoid locating the element in the map based on the key for every single map again, because the structure of all the maps is equal.
If no, would it be possible with boost::unordered_map (which is 15% slower on my machine with the code above)? Would it be possible to cache the hash value of the string?
P.S.: I know that having a std::map<std::string, std::vector<T> > would solve my problem. However, I cannot change the data structure (which is actually more complex than what I showed here).
You can cache and playback the sequence of comparison results using a stateful comparator. But that's just nasty; the solution is to adjust the data structure. There's no "cannot." Actually, adding a stateful comparator is changing the data structure. That requirement rules out almost anything.
Another possibility is to create a linked list across the objects of type T so you can get from each map to the next without another lookup. If you might be starting at any of the maps (please, just refactor the structure) then a circular or doubly-linked list will do the trick.
As profiling confirms that most time is spent in the find function
Keeping the tree data structures and optimizing the comparison can only speed up the comparison. Unless the time is spent in operator< (std::string const&, std::string const&), you need to change the way it's linked together.
The underlying data structure I am using is:
map<int, Cell> struct Cell{ char c; Cell*next; };
In effect the data structure maps an int to a linked list. The map(in this case implemented as a hashmap) ensures that finding a value in the list runs in constant time. The Linked List ensures that insertion and deletion also run in constant time. At each processing iteration I am doing something like:
Cell *cellPointer1 = new Cell;
//Process cells, build linked list
Once the list is built I put the elements Cell in map. The structure was working just fine and after my program I deallocate memory. For each Cell in the list.
delete cellPointer1
But at the end of my program I have a memory leak!!
To test memory leak I use:
#include <stdlib.h>
#include <crtdbg.h>
#define _CRTDBG_MAP_ALLOC
_CrtDumpMemoryLeaks();
I'm thinking that somewhere along the way the fact that I am putting the Cells in the map does not allow me to deallocate the memory correctly. Does anyone have any ideas on how to solve this problem?
We'll need to see your code for insertion and deletion to be sure about it.
What I'd see as a memleak-free insert / remove code would be:
( NOTE: I'm assuming you don't store the Cells that you allocate in the map )
//
// insert
//
std::map<int, Cell> _map;
Cell a; // no new here!
Cell *iter = &a;
while( condition )
{
Cell *b = new Cell();
iter->next = b;
iter = b;
}
_map[id] = a; // will 'copy' a into the container slot of the map
//
// cleanup:
//
std::map<int,Cell>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell &a = i->second;
Cell *iter = a.next; // list of cells associated to 'a'.
while( iter != NULL )
{
Cell *to_delete = iter;
iter = iter->next;
delete to_delete;
}
_map.erase(i); // will remove the Cell from the map. No need to 'delete'
i++;
}
Edit: there was a comment indicating that I might not have understood the problem completely. If you insert ALL the cells you allocate in the map, then the faulty thing is that your map contains Cell, not Cell*.
If you define your map as: std::map<int, Cell *>, your problem would be solved at 2 conditions:
you insert all the Cells that you allocate in the map
the integer (the key) associated to each cell is unique (important!!)
Now the deletion is simply a matter of:
std::map<int, Cell*>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell *c = i->second;
if ( c != NULL ) delete c;
}
_map.clear();
I've built almost the exact same hybrid data structure you are after (list/map with the same algorithmic complexity if I were to use unordered_map instead) and have been using it from time to time for almost a decade though it's a kind of bulky structure (something I'd use with convenience in mind more than efficiency).
It's worth noting that this is quite different from just using std::unordered_map directly. For a start, it preserves the original order in which one inserts elements. Insertion, removal, and searches are guaranteed to happen in logarithmic time (or constant time depending on whether key searching is involved and whether you use a hash table or BST), iterators do not get invalidated on insertion/removal (the main requirement I needed which made me favor std::map over std::unordered_map), etc.
The way I did it was like this:
// I use this as the iterator for my container with
// the list being the main 'focal point' while I
// treat the map as a secondary structure to accelerate
// key searches.
typedef typename std::list<Value>::iterator iterator;
// Values are stored in the list.
std::list<Value> data;
// Keys and iterators into the list are stored in a map.
std::map<Key, iterator> accelerator;
If you do it like this, it becomes quite easy. push_back is a matter of pushing back to the list and adding the last iterator to the map, iterator removal is a matter of removing the key pointed to by the iterator from the map before removing the element from the list as the list iterator, finding a key is a matter of searching the map and returning the associated value in the map which happens to be the list iterator, key removal is just finding a key and then doing iterator removal, etc.
If you want to improve all methods to constant time, then you can use std::unordered_map instead of std::map as I did here (though that comes with some caveats).
Taking an approach like this should simplify things considerably over an intrusive list-based solution where you're manually having to free memory.
Is there a reason why you are not using built-in containers like, say, STL?
Anyhow, you don't show the code where the allocation takes place, nor the map definition (is this coming from a library?).
Are you sure you deallocate all of the previously allocated Cells, starting from the last one and going backwards up to the first?
You could do this using the STL (remove next from Cell):
std::unordered_map<int,std::list<Cell>>
Or if cell only contains a char
std::unordered_map<int,std::string>
If your compiler doesn't support std::unordered_map then try boost::unordered_map.
If you really want to use intrusive data structures, have a look at Boost Intrusive.
As others have pointed out, it may be hard to see what you're doing wrong without seeing your code.
Someone should mention, however, that you're not helping yourself by overlaying two container types here.
If you're using a hash_map, you already have constant insertion and deletion time, see the related Hash : How does it work internally? post. The only exception to the O(c) lookup time is if your implementation decides to resize the container, in which case you have added overhead regardless of your linked list addition. Having two addressing schemes is only going to make things slower (not to mention buggier).
Sorry if this doesn't point you to the memory leak, but I'm sure a lot of memory leaks / bugs come from not using stl / boost containers to their full potential. Look into that first.
You need to be very careful with what you are doing, because values in a C++ map need to be copyable and with your structure that has raw pointers, you must handle your copy semantics properly.
You would be far better off using std::list where you won't need to worry about your copy semantics.
If you can't change that then at least std::map<int, Cell*> will be a bit more manageable, although you would have to manage the pointers in your map because std::map will not manage them for you.
You could of course use std::map<int, shared_ptr<Cell> >, probably easiest for you for now.
If you also use shared_ptr within your Cell object itself, you will need to beware of circular references, and as Cell will know it's being shared_ptr'd you could derive it from enable_shared_from_this
My final point will be that list is very rarely the correct collection type to use. It is the correct one to use sometimes, especially when you have an LRU cache situation and you want to move accessed elements to the end of the list fast. However that is the minority case and it probably doesn't apply here. Think of an alternative collection you really want. map< int, set<char> > perhaps? or map< int, vector< char > > ?
Your list has a lot of overheads to store a few chars
can anyone recommend a nice and tidy way to achieve this:
float CalculateGoodness(const Thing& thing);
void SortThings(std::vector<Thing>& things)
{
// sort 'things' on value returned from CalculateGoodness, without calling CalculateGoodness more than 'things.size()' times
}
Clearly I could use std::sort with a comparison function that calls CalculateGoodness, but then that will get called several times per Thing as it is compared to other elements, which is no good if CalculateGoodness is expensive. I could create another std::vector just to store the ratings and std::sort that, and rearrange things in the same way, but I can't see a tidy way of doing that. Any ideas?
Edit: Apologies, I should have said without modifying Thing, else it's a fairly easy problem to solve :)
I can think of a simple transformation (well two) to get what you want. You could use std::transform with suitable predicates.
std::vector<Thing> to std::vector< std::pair<Result,Thing> >
sort the second vector (works because a pair is sorted by it first member)
reverse transformation
Tadaam :)
EDIT: Minimizing the number of copies
std::vector<Thing> to std::vector< std::pair<Result,Thing*> >
sort the second vector
transform back into a secondary vector (local)
swap the original and local vectors
This way you would only copy each Thing once. Notably remember that sort perform copies so it could be worth using.
And because I am feeling grant:
typedef std::pair<float, Thing*> cached_type;
typedef std::vector<cached_type> cached_vector;
struct Compute: std::unary_function< Thing, cached_type >
{
cached_type operator()(Thing& t) const
{
return cached_type(CalculateGoodness(t), &t);
}
};
struct Back: std::unary_function< cached_type, Thing >
{
Thing operator()(cached_type t) const { return *t.second; }
};
void SortThings(std::vector<Thing>& things)
{
// Reserve to only allocate once
cached_vector cache; cache.reserve(things.size());
// Compute Goodness once and for all
std::transform(things.begin(), things.end(),
std::back_inserter(cache), Compute());
// Sort
std::sort(cache.begin(), cache.end());
// We have references inside `things` so we can't modify it
// while dereferencing...
std::vector<Thing> local; local.reserve(things.size());
// Back transformation
std::transform(cache.begin(), cache.end(),
std::back_inserter(local), Back());
// Put result in `things`
swap(things, local);
}
Provided with the usual caveat emptor: off the top of my head, may kill kittens...
You can have a call to CalculateGoodness that you call for each element before sorting, and then CalculateGoodness simply updates an internal member variable. Then you can sort based on that member variable.
Another possibility if you can't modify your type, is storing some kind of std::map for your objects and their previously calculated values. Your sort function would use that map which acts as a cache.
I've upvoted Brian's answer because it clearly best answers what you're looking for. But another solution you should consider is just write it the easy way. Processors are getting more powerful every day. Make it correct and move on. You can profile it later to see if CalculateGoodness really is the bottleneck.
I'd create pairs of ratings and things, calling CalculateGoodness once per thing, and sort that on the rating. if applicable you could also move this to a map from rating to thing
the other option would be to cache CalculateGoodness in the Thing itself either as a simple field or by making CalculateGoodness a method of Thing (making sure the cache is mutable so const Things still works)
Perhaps a tidy way of doing the separate vector thing is to actually create a vector< pair<float, Thing*> >, where the second element points to the Thing object with the corresponding float value. If you sort this vector by the float values, you can iterate over it and read the Thing objects in the correct order, possibly playing them into another vector or list so they end up stored in order.