C++ LRU cache - need suggestions on how to improve speed - c++

The task is to implement an O(1) Least Recently Used Cache
Here is the question on leetcode
https://leetcode.com/problems/lru-cache/
Here is my solution, while it is O(1) it is not the fastest implementationcould you give some feedback and maybe ideas on how can I optimize this ? Thank you !
#include<unordered_map>
#include<list>
class LRUCache {
// umap<key,<value,listiterator>>
// store the key,value, position in list(iterator) where push_back occurred
private:
unordered_map<int,pair<int,list<int>::iterator>> umap;
list<int> klist;
int cap = -1;
public:
LRUCache(int capacity):cap(capacity){
}
int get(int key) {
// if the key exists in the unordered map
if(umap.count(key)){
// remove it from the old position
klist.erase(umap[key].second);
klist.push_back(key);
list<int>::iterator key_loc = klist.end();
umap[key].second = --key_loc;
return umap[key].first;
}
return -1;
}
void put(int key, int value) {
// if key already exists delete it from the the umap and klist
if(umap.count(key)){
klist.erase(umap[key].second);
umap.erase(key);
}
// if the unordered map is at max capacity
if(umap.size() == cap){
umap.erase(klist.front());
klist.pop_front();
}
// finally update klist and umap
klist.push_back(key);
list<int>::iterator key_loc = klist.end();
umap[key].first = value;
umap[key].second = --key_loc;
return;
}
};
/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache* obj = new LRUCache(capacity);
* int param_1 = obj->get(key);
* obj->put(key,value);
*/

Here's some optimizations that might help:
Take this segment of code from the get function:
if(umap.count(key)){
// remove it from the old position
klist.erase(umap[key].second);
The above will lookup key in the map twice. Once for the count method to see if it exists. Another to invoke the [] operator to fetch its value. Save a few cycles by doing this:
auto itor = umap.find(key);
if (itor != umap.end()) {
// remove it from the old position
klist.erase(itor->second);
In the put function, you do this:
if(umap.count(key)){
klist.erase(umap[key].second);
umap.erase(key);
}
Same thing as get, you can avoid the redundant search through umap. Additionally, there's no reason to invoke umap.erase only to add that same key back into the map a few lines later.
Further, this is also inefficient
umap[key].first = value;
umap[key].second = --key_loc;
Similar to above, redundantly looking up key twice in the map. In the first assignment statement, the key is not in the map, so it default constructs a new value pair thing. The second assignment is doing another lookup in the map.
Let's restructure your put function as follows:
void put(int key, int value) {
auto itor = umap.find(key);
bool reinsert = (itor != umap.end());
// if key already exists delete it from the klist only
if (reinsert) {
klist.erase(umap[key].second);
}
else {
// if the unordered map is at max capacity
if (umap.size() == cap) {
umap.erase(klist.front());
klist.pop_front();
}
}
// finally update klist and umap
klist.push_back(key);
list<int>::iterator key_loc = klist.end();
auto endOfList = --key_loc;
if (reinsert) {
itor->second.first = value;
itor->second.second = endOfList;
}
else {
const pair<int, list<int>::iterator> itempair = { value, endOfList };
umap.emplace(key, itempair);
}
}
That's as far as you can probably go by using std::list. The downside of the list type is that there's no way to move an existing node from the middle to the front (or back) without first removing it and then adding it back. That's a couple of unneeded memory allocations to update the list. Possible alternative is that you just use your own double-linked list type and manually fixup the prev/next pointer yourself.

Here is my solution, while it is O(1) it is not the fastest implementation
could you give some feedback and maybe ideas on how can I optimize this ? Thank you !
Gonna take on selbie's point here:
Every instance of if(umap.count(key)) will search for the key and using umap[key] is the equivalent for the search. You can avoid the double search by assigning an iterator which points to the key by a single std::unordered_map::find() operation.
selbie already gave the code for int get()'s search, here's the one for void put()'s one:
auto it = umap.find(key);
if (it != umap.end())
{
klist.erase(it ->second);
umap.erase(key);
}
Sidecase:
Not applicable for your code as of now due to lack of input and output work, but in case you use std::cin and std::cout, you can disable the synchronization between C and C++ streams, and untie cin from cout as an optimization: (they are tied together by default)
// If your using cin/cout or I/O
ios::sync_with_stdio(false);
cin.tie(nullptr);
cout.tie(nullptr);

Related

How do I implement linear probing in C++?

I'm new to Hash Maps and I have an assignment due tomorrow. I implemented everything and it all worked out fine, except for when I get a collision. I cant quite understand the idea of linear probing, I did try to implement it based on what I understood, but the program stopped working for table size < 157, for some reason.
void hashEntry(string key, string value, entry HashTable[], int p)
{
key_de = key;
val_en = value;
for (int i = 0; i < sizeof(HashTable); i++)
{
HashTable[Hash(key, p) + i].key_de = value;
}
}
I thought that by adding a number each time to the hash function, 2 buckets would never get the same Hash index. But that didn't work.
A hash table with linear probing requires you
Initiate a linear search starting at the hashed-to location for an empty slot in which to store your key+value.
If the slot encountered is empty, store your key+value; you're done.
Otherwise, if they keys match, replace the value; you're done.
Otherwise, move to the next slot, hunting for any empty or key-matching slot, at which point (2) or (3) transpires.
To prevent overrun, the loop doing all of this wraps modulo the table size.
If you run all the way back to the original hashed-to location and still have no empty slot or matching-key overwrite, your table is completely populated (100% load) and you cannot insert more key+value pairs.
That's it. In practice it looks something like this:
bool hashEntry(string key, string value, entry HashTable[], int p)
{
bool inserted = false;
int hval = Hash(key, p);
for (int i = 0; !inserted && i < p; i++)
{
if (HashTable[(hval + i) % p].key_de.empty())
{
HashTable[(hval + i) % p].key_de = key;
}
if (HashTable[(hval + i) % p].key_de == key)
{
HashTable[(hval + i) % p].val_en = value;
inserted = true;
}
}
return inserted;
}
Note that expanding the table in a linear-probing hash algorithm is tedious. I suspect that will be forthcoming in your studies. Eventually you need to track how many slots are taken so when the table exceeds a specified load factor (say, 80%), you expand the table, rehashing all entries on the new p size, which will change where they all end up residing.
Anyway, hope it makes sense.

How do I properly pass iterator by reference?

I have a game where I check collision between bullets and enemies which I store as 2 vector containers. People say if you're gonna erase an element in the for loop you better use iterators and so I did. But I have a problem now with passing the iterator to a function. The thing is I don't necessarily need to erase the element so it has to be a bit more complex.
This is the way I check collision. "CircularCollision" works fine, no mistakes there.
void ResolveColision(Weapon &weap, Map &map)
{
std::vector<Bullet> bullets = weap.GetBullets();
if (!bullets.empty())
{
for (std::vector<Bullet>::iterator i = bullets.begin(); i != bullets.end(); ++i)
{
std::vector<Enemy> enemies = map.GetEnemies();
if (!enemies.empty())
{
for (std::vector<Enemy>::iterator j = enemies.begin(); j != enemies.end(); ++j)
{
if (CircularCollision((*i), (*j)))
{
weap.DeleteByIndex(i);
map.TakeDamageByIndex(j, weap.GetDamage());
std::cout << "HIT!\n";
}
}
}
}
}
}
Here's the method which is supposed to decrease the health of an enemy:
void Map::TakeDamageByIndex(std::vector<Enemy>::iterator &itr, int damage)
{
(*itr).SetHealth((*itr).GetHealth() - damage);
}
Here's the method which deletes the bullet:
void Weapon::DeleteByIndex(std::vector<Bullet>::iterator &itr)
{
destroySprite((*itr).GetSprite());
bullets.erase(itr);
}
I'm sure it looks horrible and it shouldn't work but I have no idea how to do it properly. Please help!
Also, both methods work properly when the for loops operate with indexes (e.g. bullets[i]), in that case the problem is with "Vector subscript out of range" error.
In DeleteByIndex(), change this:
bullets.erase(itr);
To this:
itr = bullets.erase(itr);
std::vector::erase() returns an iterator to the next remaining element after the element that was erased. That next element is where your outer loop needs to continue from on its next iteration.
As such, you need to change your outer loop from a for to a while instead, or else you will skip elements (in fact, your original code suffers from that problem when you were still using indexes):
void ResolveColision(Weapon &weap, Map &map)
{
std::vector<Bullet> bullets = weap.GetBullets();
std::vector<Bullet>::iterator bullerItr = bullets.begin();
while (bullerItr != bullets.end())
{
std::vector<Enemy> enemies = map.GetEnemies();
bool wasAnyHit = false;
for (std::vector<Enemy>::iterator enemyItr = enemies.begin(); enemyItr != enemies.end(); ++enemyItr)
{
if (CircularCollision(*bulletItr, *enemyItr))
{
wasAnyHit = true;
weap.DeleteByIndex(bulletItr);
map.TakeDamageByIndex(enemyItr, weap.GetDamage());
std::cout << "HIT!\n";
break;
}
}
if (!wasAnyHit)
++bulletItr;
}
}
That being said, I would suggest replacing the inner loop with std::find_if() instead. And renaming DeleteByIndex() and TakeDamageByIndex() since they don't take an index anymore. In fact, I would not pass an iterator to TakeDamage...() at all, pass the actual Enemy object instead. Or better, move TakeDamage() into Enemy itself.
Try something more like this:
void ResolveColision(Weapon &weap, Map &map)
{
auto bullets = weap.GetBullets();
auto bulletItr = bullets.begin();
while (bulletItr != bullets.end())
{
auto enemies = map.GetEnemies();
auto &bullet = *bulletItr;
auto enemyHit = std::find_if(enemies.begin(), enemies.end(),
[&](Enemy &enemy){ return CircularCollision(bullet, enemy); }
);
if (enemyHit != enemies.end())
{
weap.DeleteBulletByIterator(bulletItr);
enemyHit->TakeDamage(weap.GetDamage());
std::cout << "HIT!\n";
}
else
++bulletItr;
}
}
void Enemy::TakeDamage(int damage)
{
SetHealth(GetHealth() - damage);
}
void Weapon::DeleteBulletByIterator(std::vector<Bullet>::iterator &itr)
{
destroySprite(itr->GetSprite());
itr = bullets.erase(itr);
}
A few other comments in addition to Remy Lebeau’s answer.
It’s as efficient to pass a STL iterator by value as by reference, so the only reason you would need to pass one by reference is: when you intend to change the index and you want that change to be visible in the caller’s scope. (For example, a UTF-8 parser needs to consume anywhere from one to four bytes.) Since this code doesn’t need to do that, you’re better off just passing the iterator by value.
In general, if you aren’t modifying the variable you pass by reference, you should pass by const reference instead. In the case of Enemy::TakeDamage(), the only thing you do with the iterator is dereference it, so you might as well just pass in an Enemy& and call it with *i as the parameter.
The algorithm is not very efficient: if you delete a lot of items near the start of the list, you would need to move all remaining items of the array multiple times. This runs in O(N²) time. A std::list, although it has a high overhead compared to std::vector, can delete elements in constant time, and might be more efficient if you have a lot of insertions and deletions that are not at the end. You might also consider moving only the objects that survive to a new list and then destroying the old one. At least this way, you only need to copy once, and your pass runs in O(N) time.
If your containers store smart pointers to the objects, you only have to move the pointers to a new location, not the entire object. This will not make up for the overhead of lots of heap allocations if your objects are small, but could save you a lot of bandwidth if they are large. The objects will still be automatically deleted when the last reference to them is cleared.
You could do something like this:
void delByIndex(vector<int>::iterator &i, vector<int>& a)
{
a.erase(i);
}
int main()
{
vector<int> a {1,5,6,2,7,8,3};
vector<int> b {1,2,3,1};
for(auto i=a.begin();i!=a.end();)
{
bool flag = false;
for(auto j=b.begin();j!=b.end();j++)
{
if(*i==*j)
{
flag = true;
}
}
if(flag)
{
delByIndex(i, a);
}
else
i++;
}
for(auto i:a)
cout << i << " ";
return 0;
}
Be careful when using erase as it will change the size of the vector and also invalidates the vector iterator.

A* and N-Puzzle optimization

I am writing a solver for the N-Puzzle (see http://en.wikipedia.org/wiki/Fifteen_puzzle)
Right now I am using a unordered_map to store hash values of the puzzle board,
and manhattan distance as the heuristic for the algorithm, which is a plain DFS.
so I have
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
This works great for n = 2 and 3.
However, its really hit and miss for n=4 and above. (stl unable to allocate memory for a new node)
I also suspect that I am getting hash collisions in the unordered_set
unsigned makeHash(const Node & pNode)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < pNode.data_.size(); i++)
{
hash = hash * a + pNode.data_[i];
a = a * b;
}
return hash;
}
16! = 2 × 10^13 (possible arrangements)
2^32 = 4 x 10^9 (possible hash values in a 32 bit hash)
My question is how can I optimize my code to solve for n=4 and n=5?
I know from here
http://kociemba.org/fifteen/fifteensolver.html
http://www.ic-net.or.jp/home/takaken/e/15pz/index.html
that n=4 is possible in less than a second on average.
edit:
The algorithm itself is here:
bool NPuzzle::aStarSearch()
{
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
// if we are in the solved position in the first place, return true
if(initial_ == target_)
{
current_ = initial_;
return true;
}
frontier.insert(new Node(initial_)); // we are going to delete everything from the frontier later..
for(;;)
{
if(frontier.empty())
{
std::cout << "depth first search " << "cant solve!" << std::endl;
return false;
}
// remove a node from the frontier, and place it into the explored set
Node * pLeaf = *frontier.begin();
frontier.erase(frontier.begin());
explored.push_back(pLeaf);
// do the same for the hash table
unsigned hashValue = makeHash(*pLeaf);
frontierHashTable.erase(hashValue);
exploredHashTable.insert(hashValue);
std::vector<Node *> children = pLeaf->genChildren();
for( auto it = children.begin(); it != children.end(); ++it)
{
unsigned childHash = makeHash(**it);
if(inFrontierOrExplored(frontierHashTable, exploredHashTable, childHash))
{
delete *it;
}
else
{
if(**it == target_)
{
explored.push_back(*it);
current_ = **it;
// delete everything else in children
for( auto it2 = ++it; it2 != children.end(); ++it2)
delete * it2;
// delete everything in the frontier
for( auto it = frontier.begin(); it != frontier.end(); ++it)
delete *it;
// delete everything in explored
explored_.swap(explored);
for( auto it = explored.begin(); it != explored.end(); ++it)
delete *it;
return true;
}
else
{
frontier.insert(*it);
frontierHashTable.insert(childHash);
}
}
}
}
}
Since this is homework I will suggest some strategies you might try.
First, try using valgrind or a similar tool to check for memory leaks. You may have some memory leaks if you don't delete everything you new.
Second, calculate a bound on the number of nodes that should be explored. Keep track of the number of nodes you do explore. If you pass the bound, you might not be detecting cycles properly.
Third, try the algorithm with depth first search instead of A*. Its memory requirements should be linear in the depth of the tree and it should just be a matter of changing the sort ordering (pred). If DFS works, your A* search may be exploring too many nodes or your memory structures might be too inefficient. If DFS doesn't work, again it might be a problem with cycles.
Fourth, try more compact memory structures. For example, std::multiset does what you want but std::priority_queue with a std::deque may take up less memory. There are other changes you could try and see if they improve things.
First i would recommend cantor expansion, which you can use as the hashing method. It's 1-to-1, i.e. the 16! possible arrangements would be hashed into 0 ~ 16! - 1.
And then i would implement map by my self, as you may know, std is not efficient enough for computation. map is actually a Binary Search Tree, i would recommend Size Balanced Tree, or you can use AVL tree.
And just for record, directly use bool hash[] & big prime may also receive good result.
Then the most important thing - the A* function, like what's in the first of your link, you may try variety of A* function and find the best one.
You are only using the heuristic function to order the multiset. You should use the min(g(n) + f(n)) i.e. the min(path length + heuristic) to order your frontier.
Here the problem is, you are picking the one with the least heuristic, which may not be the correct "next child" to pick.
I believe this is what is causing your calculation to explode.

How do I make make my hash table with linear probing more efficient?

I'm trying to implement an efficient hash table where collisions are solved using linear probing with step. This function has to be as efficient as possible. No needless = or == operations. My code is working, but not efficient. This efficiency is evaluated by an internal company system. It needs to be better.
There are two classes representing a key/value pair: CKey and CValue. These classes each have a standard constructor, copy constructor, and overridden operators = and ==. Both of them contain a getValue() method returning value of internal private variable. There is also the method getHashLPS() inside CKey, which return hashed position in hash table.
int getHashLPS(int tableSize,int step, int collision) const
{
return ((value + (i*step)) % tableSize);
}
Hash table.
class CTable
{
struct CItem {
CKey key;
CValue value;
};
CItem **table;
int valueCounter;
}
Methods
// return collisions count
int insert(const CKey& key, const CValue& val)
{
int position, collision = 0;
while(true)
{
position = key.getHashLPS(tableSize, step, collision); // get position
if(table[position] == NULL) // free space
{
table[position] = new CItem; // save item
table[position]->key = CKey(key);
table[position]->value = CValue(val);
valueCounter++;
break;
}
if(table[position]->key == key) // same keys => overwrite value
{
table[position]->value = val;
break;
}
collision++; // current positions is full, try another
if(collision >= tableSize) // full table
return -1;
}
return collision;
}
// return collisions count
int remove(const CKey& key)
{
int position, collision = 0;
while(true)
{
position = key.getHashLPS(tableSize, step, collision);
if(table[position] == NULL) // free position - key isn't in table or is unreachable bacause of wrong rehashing
return -1;
if(table[position]->key == key) // found
{
table[position] = NULL; // remove it
valueCounter--;
int newPosition, collisionRehash = 0;
for(int i = 0; i < tableSize; i++, collisionRehash = 0) // rehash table
{
if(table[i] != NULL) // if there is a item, rehash it
{
while(true)
{
newPosition = table[i]->key.getHashLPS(tableSize, step, collisionRehash++);
if(newPosition == i) // same position like before
break;
if(table[newPosition] == NULL) // new position and there is a free space
{
table[newPosition] = table[i]; // copy from old, insert to new
table[i] = NULL; // remove from old
break;
}
}
}
}
break;
}
collision++; // there is some item on newPosition, let's count another
if(collision >= valueCounter) // item isn't in table
return -1;
}
return collision;
}
Both functions return collisions count (for my own purpose) and they return -1 when the searched CKey isn't in the table or the table is full.
Tombstones are forbidden. Rehashing after removing is a must.
The biggest change for improvement I see is in the removal function. You shouldn't need to rehash the entire table. You only need to rehash starting from the removal point until you reach an empty bucket. Also, when re-hashing, remove and store all of the items that need to be re-hashed before doing the re-hashing so that they don't get in the way when placing them back in.
Another thing. With all hashes, the quickest way to increase efficiency to to decrease the loadFactor (the ratio of elements to backing-array size). This reduces the number of collisions, which means less iterating looking for an open spot, and less rehashing on removal. In the limit, as the loadFactor approaches 0, collision probability approaches 0, and it becomes more and more like an array. Though of course memory use goes up.
Update
You only need to rehash starting from the removal point and moving forward by your step size until you reach a null. The reason for this is that those are the only objects that could possibly change their location due to the removal. All other objects would wind up hasing to the exact same place, since they don't belong to the same "collision run".
A possible improvement would be to pre-allocate an array of CItems, that would avoid the malloc()s / news and free() deletes; and you would need the array to be changed to "CItem *table;"
But again: what you want is basically a smooth ride in a car with square wheels.

Finding a nonexisting key in a std::map

Is there a way to find a nonexisting key in a map?
I am using std::map<int,myclass>, and I want to automatically generate a key for new items. Items may be deleted from the map in different order from their insertion.
The myclass items may, or may not be identical, so they can not serve as a key by themself.
During the run time of the program, there is no limit to the number of items that are generated and deleted, so I can not use a counter as a key.
An alternative data structure that have the same functionality and performance will do.
Edit
I trying to build a container for my items - such that I can delete/modify items according to their keys, and I can iterate over the items. The key value itself means nothing to me, however, other objects will store those keys for their internal usage.
The reason I can not use incremental counter, is that during the life-span of the program they may be more than 2^32 (or theoretically 2^64) items, however item 0 may theoretically still exist even after all other items are deleted.
It would be nice to ask std::map for the lowest-value non-used key, so i can use it for new items, instead of using a vector or some other extrnal storage for non-used keys.
I'd suggest a combination of counter and queue. When you delete an item from the map, add its key to the queue. The queue then keeps track of the keys that have been deleted from the map so that they can be used again. To get a new key, you first check if the queue is empty. If it isn't, pop the top index off and use it, otherwise use the counter to get the next available key.
Let me see if I understand. What you want to do is
look for a key.
If not present, insert an element.
Items may be deleted.
Keep a counter (wait wait) and a vector. The vector will keep the ids of the deleted items.
When you are about to insert the new element,look for a key in the vector. If vector is not empty, remove the key and use it. If its empty, take one from the counter (counter++).
However, if you neveer remove items from the map, you are just stuck with a counter.
Alternative:
How about using the memory address of the element as a key ?
I would say that for general case, when key can have any type allowed by map, this is not possible. Even ability to say whether some unused key exists requires some knowledge about type.
If we consider situation with int, you can store std::set of contiguous segments of unused keys (since these segments do not overlap, natural ordering can be used - simply compare their starting points). When a new key is needed, you take the first segment, cut off first index and place the rest in the set (if the rest is not empty). When some key is released, you find whether there are neighbour segments in the set (due to set nature it's possible with O(log n) complexity) and perform merging if needed, otherwise simply put [n,n] segment into the set.
in this way you will definitely have the same order of time complexity and order of memory consumption as map has independently on requests history (because number of segments cannot be more than map.size()+1)
something like this:
class TKeyManager
{
public:
TKeyManager()
{
FreeKeys.insert(
std::make_pair(
std::numeric_limits<int>::min(),
std::numeric_limits<int>::max());
}
int AlocateKey()
{
if(FreeKeys.empty())
throw something bad;
const std::pair<int,int> freeSegment=*FreeKeys.begin();
if(freeSegment.second>freeSegment.first)
FreeKeys.insert(std::make_pair(freeSegment.first+1,freeSegment.second));
return freeSegment.first;
}
void ReleaseKey(int key)
{
std:set<std::pair<int,int>>::iterator position=FreeKeys.insert(std::make_pair(key,key)).first;
if(position!=FreeKeys.begin())
{//try to merge with left neighbour
std::set<std::pair<int,int>>::iterator left=position;
--left;
if(left->second+1==key)
{
left->second=key;
FreeKeys.erase(position);
position=left;
}
}
if(position!=--FreeKeys.end())
{//try to merge with right neighbour
std::set<std::pair<int,int>>::iterator right=position;
++right;
if(right->first==key+1)
{
position->second=right->second;
FreeKeys.erase(right);
}
}
}
private:
std::set<std::pair<int,int>> FreeKeys;
};
Is there a way to find a nonexisting
key in a map?
I'm not sure what you mean here. How can you find something that doesn't exist? Do you mean, is there a way to tell if a map does not contain a key?
If that's what you mean, you simply use the find function, and if the key doesn't exist it will return an iterator pointing to end().
if (my_map.find(555) == my_map.end()) { /* do something */ }
You go on to say...
I am using std::map, and
I want to automatically generate a key
for new items. Items may be deleted
from the map in different order from
their insertion. The myclass items may, or may not be identical, so they can not serve as a key by themself.
It's a bit unclear to me what you're trying to accomplish here. It seems your problem is that you want to store instances of myclass in a map, but since you may have duplicate values of myclass, you need some way to generate a unique key. Rather than doing that, why not just use std::multiset<myclass> and just store duplicates? When you look up a particular value of myclass, the multiset will return an iterator to all the instances of myclass which have that value. You'll just need to implement a comparison functor for myclass.
Could you please clarify why you can not use a simple incremental counter as auto-generated key? (increment on insert)? It seems that there's no problem doing that.
Consider, that you decided how to generate non-counter based keys and found that generating them in a bulk is much more effective than generating them one-by-one.
Having this generator proved to be "infinite" and "statefull" (it is your requirement), you can create a second fixed sized container with say 1000 unused keys.
Supply you new entries in map with keys from this container, and return keys back for recycling.
Set some low "threshold" to react on key container reaching low level and refill keys in bulk using "infinite" generator.
The actual posted problem still exists "how to make efficient generator based on non-counter". You may want to have a second look at the "infinity" requirement and check if say 64-bit or 128-bit counter still can satisfy your algorithms for some limited period of time like 1000 years.
use uint64_t as a key type of sequence or even if you think that it will be not enough
struct sequence_key_t {
uint64_t upper;
uint64_t lower;
operator++();
bool operator<()
};
Like:
sequence_key_t global_counter;
std::map<sequence_key_t,myclass> my_map;
my_map.insert(std::make_pair(++global_counter, myclass()));
and you will not have any problems.
Like others I am having difficulty figuring out exactly what you want. It sounds like you want to create an item if it is not found. sdt::map::operator[] ( const key_type& x ) will do this for you.
std::map<int, myclass> Map;
myclass instance1, instance2;
Map[instance1] = 5;
Map[instance2] = 6;
Is this what you are thinking of?
Going along with other answers, I'd suggest a simple counter for generating the ids. If you're worried about being perfectly correct, you could use an arbitrary precision integer for the counter, rather than a built in type. Or something like the following, which will iterate through all possible strings.
void string_increment(std::string& counter)
{
bool carry=true;
for (size_t i=0;i<counter.size();++i)
{
unsigned char original=static_cast<unsigned char>(counter[i]);
if (carry)
{
++counter[i];
}
if (original>static_cast<unsigned char>(counter[i]))
{
carry=true;
}
else
{
carry=false;
}
}
if (carry)
{
counter.push_back(0);
}
}
e.g. so that you have:
std::string counter; // empty string
string_increment(counter); // now counter=="\x00"
string_increment(counter); // now counter=="\x01"
...
string_increment(counter); // now counter=="\xFF"
string_increment(counter); // now counter=="\x00\x00"
string_increment(counter); // now counter=="\x01\x00"
...
string_increment(counter); // now counter=="\xFF\x00"
string_increment(counter); // now counter=="\x00\x01"
string_increment(counter); // now counter=="\x01\x01"
...
string_increment(counter); // now counter=="\xFF\xFF"
string_increment(counter); // now counter=="\x00\x00\x00"
string_increment(counter); // now counter=="\x01\x00\x00"
// etc..
Another option, if the working set actually in the map is small enough would be to use an incrementing key, then re-generate the keys when the counter is about to wrap. This solution would only require temporary extra storage. The hash table performance would be unchanged, and the key generation would just be an if and an increment.
The number of items in the current working set would really determine if this approach is viable or not.
I loved Jon Benedicto's and Tom's answer very much. To be fair, the other answers that only used counters may have been the starting point.
Problem with only using counters
You always have to increment higher and higher; never trying to fill the empty gaps.
Once you run out of numbers and wrap around, you have to do log(n) iterations to find unused keys.
Problem with the queue for holding used keys
It is easy to imagine lots and lots of used keys being stored in this queue.
My Improvement to queues!
Rather than storing single used keys in the queue; we store ranges of unused keys.
Interface
using Key = wchar_t; //In my case
struct Range
{
Key first;
Key last;
size_t size() { return last - first + 1; }
};
bool operator< (const Range&,const Range&);
bool operator< (const Range&,Key);
bool operator< (Key,const Range&);
struct KeyQueue__
{
public:
virtual void addKey(Key)=0;
virtual Key getUniqueKey()=0;
virtual bool shouldMorph()=0;
protected:
Key counter = 0;
friend class Morph;
};
struct KeyQueue : KeyQueue__
{
public:
void addKey(Key)override;
Key getUniqueKey()override;
bool shouldMorph()override;
private:
std::vector<Key> pool;
friend class Morph;
};
struct RangeKeyQueue : KeyQueue__
{
public:
void addKey(Key)override;
Key getUniqueKey()override;
bool shouldMorph()override;
private:
boost::container::flat_set<Range,std::less<>> pool;
friend class Morph;
};
void morph(KeyQueue__*);
struct Morph
{
static void morph(const KeyQueue &from,RangeKeyQueue &to);
static void morph(const RangeKeyQueue &from,KeyQueue &to);
};
Implementation
Note: Keys being added are assumed to be key not found in queue
// Assumes that Range is valid. first <= last
// Assumes that Ranges do not overlap
bool operator< (const Range &l,const Range &r)
{
return l.first < r.first;
}
// Assumes that Range is valid. first <= last
bool operator< (const Range &l,Key r)
{
int diff_1 = l.first - r;
int diff_2 = l.last - r;
return diff_1 < -1 && diff_2 < -1;
}
// Assumes that Range is valid. first <= last
bool operator< (Key l,const Range &r)
{
int diff = l - r.first;
return diff < -1;
}
void KeyQueue::addKey(Key key)
{
if(counter - 1 == key) counter = key;
else pool.push_back(key);
}
Key KeyQueue::getUniqueKey()
{
if(pool.empty()) return counter++;
else
{
Key key = pool.back();
pool.pop_back();
return key;
}
}
bool KeyQueue::shouldMorph()
{
return pool.size() > 10;
}
void RangeKeyQueue::addKey(Key key)
{
if(counter - 1 == key) counter = key;
else
{
auto elem = pool.find(key);
if(elem == pool.end()) pool.insert({key,key});
else // Expand existing range
{
Range &range = (Range&)*elem;
// Note at this point, key is 1 value less or greater than range
if(range.first > key) range.first = key;
else range.last = key;
}
}
}
Key RangeKeyQueue::getUniqueKey()
{
if(pool.empty()) return counter++;
else
{
Range &range = (Range&)*pool.begin();
Key key = range.first++;
if(range.first > range.last) // exhausted all keys in range
pool.erase(pool.begin());
return key;
}
}
bool RangeKeyQueue::shouldMorph()
{
return pool.size() == 0 || pool.size() == 1 && pool.begin()->size() < 4;
}
void morph(KeyQueue__ *obj)
{
if(KeyQueue *queue = dynamic_cast<KeyQueue*>(obj))
{
RangeKeyQueue *new_queue = new RangeKeyQueue();
Morph::morph(*queue,*new_queue);
obj = new_queue;
}
else if(RangeKeyQueue *queue = dynamic_cast<RangeKeyQueue*>(obj))
{
KeyQueue *new_queue = new KeyQueue();
Morph::morph(*queue,*new_queue);
obj = new_queue;
}
}
void Morph::morph(const KeyQueue &from,RangeKeyQueue &to)
{
to.counter = from.counter;
for(Key key : from.pool) to.addKey(key);
}
void Morph::morph(const RangeKeyQueue &from,KeyQueue &to)
{
to.counter = from.counter;
for(Range range : from.pool)
while(range.first <= range.last)
to.addKey(range.first++);
}
Usage:
int main()
{
std::vector<Key> keys;
KeyQueue__ *keyQueue = new KeyQueue();
srand(time(NULL));
bool insertKey = true;
for(int i=0; i < 1000; ++i)
{
if(insertKey)
{
Key key = keyQueue->getUniqueKey();
keys.push_back(key);
}
else
{
int index = rand() % keys.size();
Key key = keys[index];
keys.erase(keys.begin()+index);
keyQueue->addKey(key);
}
if(keyQueue->shouldMorph())
{
morph(keyQueue);
}
insertKey = rand() % 3; // more chances of insert
}
}