I am importing items from an XML file. Each XML element (FoodItem, Person, Order, CoffeeRun) is a class & each of these elements will have a unique ID(unique to that class).
<person>
<id>0</id>
<name>...</name>
</person>
<FoodItem>
<id>0</id>
<name>Coffee</name>
</FoodItem>
I am trying to develop a sub class DatabaseItem, that ensures that no 2 objects of a class have the same ID. Can you assist me, by helping me develop an efficient algorithm that makes sure no object will have the same ID as another?
My 2 approaches seem a little inefficient to me:
Use a static class vector that contains all the USED ids so far. When a new DatabaseID( int requestedID ) object is created I check whether the ID is available by going over all the used values in the vector to check the ID is not already there, I think thats Big-O'n speed?
Use a static class bool vector where each element of the vector corresponds to an id (so vector[1] will correspond to the object with ID 1). I check if an ID is already taken by seeing if that element in the vector is true if ( v[nID] == true ) { // this ID is already taken }. This seems inefficient because it means my vector will take a lot of memeory right?
I am not familiar with using maps in C++ but maybe I should use one here?
Any advice on an efficient algorithm would be really helpful:
class DatabaseItem
{
public:
static unsigned int instanceCount;
DatabaseItem()
{
// Assign next available ID
}
DatabaseItem( unsigned int nID )
{
// Check that that id is not already taken
// if id is taken, look for next available id &
// give the item that id
}
private:
unsigned int uniqueID;
};
// My solution: Do you have any better ideas that ensure no objects jave the same ID?
// This seems REALLY inefficient...
class DatabaseItem
{
public:
static unsigned int instanceCount;
static vector <unsigned int> usedIDs;
DatabaseItem()
{
DatabaseItem::instanceCount++;
uniqueID = instanceCount;
usedIDs.add( instanceCount );
}
DatabaseItem( unsigned int nID )
{
if ( isIDFree( nID ) )
{
uniqueID = nID;
}
else uniqueID = nextAvailableID();
DatabaseItem::instanceCount++;
}
bool isIDFree( unsigned int nID )
{
// This is pretty slow to check EVERY element
for (int i=0; i<usedIDs.size(); i++)
{
if (usedIDs[i] == nID)
{
return false;
}
}
return true;
}
unsigned int nextAvailableID()
{
while ( true )
{
unsigned int ID = 0;
if ( isIDFree( ID ) )
{
return ID;
}
else ID++;
}
}
private:
unsigned int uniqueID;
};
// Alternate that uses boolean vector to track which ids are occupied
// This means I take 30000 boolean memory when I may not need all that
class DatabaseItem
{
public:
static unsigned int instanceCount;
static const unsigned int MAX_INSTANCES = 30000;
static vector <bool> idVector;
// Is this how I initialise a static class vector...? (note this code will be outside the class definition)
// vector <bool> DatabaseItem::idVector( MAX_INSTANCES, false );
DatabaseItem()
{
uniqueID = nextAvailableID();
idVector[uniqueID] = true;
}
DatabaseItem( unsigned int nID )
{
if ( nID >= MAX_INSTANCES )
{
// not sure how I shd handle this case?
}
if ( idVector[nID] == false )
{
uniqueID = nID;
idVector[nID] = true;
}
else
{
uniqueID = nextAvailableID();
idVector[uniqueID] = true;
}
instanceCount++;
}
unsigned int nextAvailableID()
{
for (int i=0; i<idVector.size(); i++)
{
if ( !idVector[i] )
{
return i;
}
}
return -1;
}
bool isIDFree( unsigned int nID )
{
// Note I cannot do this: Because I am using Mosync API & it doesn't support any C++ exceptions'
// I declare idVector with no size! so not idVector( 30000, false)... just idVector;
// then I allow an exception to occur to check if an id is taken
try
{
return idVector[nID];
}
catch (...)
{
return true;
}
}
private:
unsigned int uniqueID;
};
A vector<bool> is implemented with one bit per bool, so it's not wasting as much space as you assume.
A set<unsigned int> is the easy solution to this. A vector<bool> is faster. Both could use a bit of memory. Depending on your usage patterns, there's a few other solutions:
An unsigned int all_taken_upto_this; combined with a set<int> covering all the oddball ID's that are higher than all_taken_upto_this - remove from set and increase the counter when you can.
A map<unsigned int, unsigned int> which is logically treated as begin,end of either taken or free sequences. This'll take a little fiddling to implement correctly (merging consecutive map elements when you add the last ID in between two elements e.g.)
You could probably use a premade "sparse bitset" type data structure - I don't know any implementations OTOH.
Depending on the number of elements and a couple other issues, you might consider actually storing them (or at least pointers to them) in a map. That would be rather simple to implement, but will take some space. On the other hand, it will provide you with fast lookup by id which might be a clear advantage if there are cross references in the XML. The map (assuming pointers) would look like:
std::map<int, std::shared_ptr<Object> > id_map;
std::shared_ptr<Object> p( new Object( xml ) );
if ( !id_map.insert( std::make_pair( p->id, p ) ).second ) {
// failed to insert, the element is a duplicate!!!
}
If you are not locked into using an integer you may look into GUIDs (Global Unique IDs). Depending on which platform you are using you can generally find a couple of utility functions to dynamically generate a GUID. If using Visual Studio, I've used the CoCreateGuid function.
If you are locked into a 32-bit integer another option option is a hash table. If each XML element is unique, then a hashing function could generate a unique hash value. Depending on the size of your data set there is still a small probability of collision. The one I've used that seems to have a pretty low collision rate with the data set that I've worked with is called the Jenkins hash function
Related
I have a collection of polygons that I retrieve from the database and which I wish to store in a binary tree for fast accessing. As a binary tree I use std::map.
I created this solution, which is outlined below, but I think that it is not correct because I do not call free() to release the memory allocated by malloc().
My questions (problems):
Is it correct to use std::map if I only need to insert and access elements of this map? I just want to find geometries fast by their ID's.
In the std::map I store pointers to the geometries instead of storing geometries themselves. Is this a good idea? Before I tried to store the geometries themselves, but then I realized that the std::map makes a copy of the object, which created problems.
In the method ConvertSpatial2GPC(..) I create gpc_geometry objects, which create references, which I release at gpc_free_polygon(..). But I can't release the gpc_geometry object itself, because I do not have a reference to it at that point.
I use the following structures:
typedef struct /* Polygon vertex structure */
{
double x; /* Vertex x component */
double y; /* vertex y component */
} gpc_vertex;
typedef struct /* Vertex list structure */
{
int num_vertices; /* Number of vertices in list */
gpc_vertex *vertex; /* Vertex array pointer */
} gpc_vertex_list;
typedef struct /* Polygon set structure */
{
int num_contours; /* Number of contours in polygon */
int *hole; /* Hole / external contour flags */
gpc_vertex_list *contour; /* Contour array pointer */
} gpc_polygon;
typedef std::map<long, gpc_polygon*> layer;
My workflow is as follows:
Load items from database
Call method initializeLayer() which returns a layer (see previous typedef)
... Work with the layer ...
Call method freeLayer() to free the memory used by the layer
Code for initializing geometry objects:
layer initializeLayer() {
//... database connection code
//find the count of objects in database
int count = ...
//helper object for loading from database
spatial_obj* sp_obj = NULL;
//initialize a array to hold the objects
gpc_polygon* gpc_objects;
gpc_objects = (gpc_polygon*)malloc(sizeof(gpc_polygon) * count);
layer myLayer;
int i = 0;
//... query database
while(db.Fetch()) {
id = db.GetLongData(0);
db.GetSDO_Object(&sp_obj); //load from database
db.ConvertSpatial2GPC(sp_obj, &gpc_mullad[i]); //convert polygon to GPC format
//insert a pair (ID->pointer to the geometry)
myLayer.insert(layer::value_type(id, &gpc_objects[i]);
i++;
}
return layer;
}
Code for freeing layer:
void freeLayer(layer myLayer) {
for (layer::iterator it = myLayer.begin(); it != myLayer.end(); ++it) {
gpc_free_polygon(it->second); //frees the memory from this geometry object
}
}
Code for freeing geometry object:
void gpc_free_polygon(gpc_polygon *p)
{
int c;
for (c= 0; c < p->num_contours; c++) {
FREE(p->contour[c].vertex);
FREE(p->hole);
FREE(p->contour);
p->num_contours= 0;
}
I think that I am making things more complicated that they should be.
I don't really need a std::map to store the pointers. I can instead ask the polygons from the database so that they they are already ordered by their ID's. And then I can store the polygons in a static structure (array or vector). When I need to find a element by its ID, I will just use a binary search algorithm to find it (which is logarithmic time like the search algorithm used by the binary tree, anyway).
So, my method initializeLayer() will return an array or vector instead, which I will free at the end of the program.
EDIT: I found that I don't have to implement the binary search myself. There is a class for this: std::binary_search. Link: Binary search algorithm
EDIT2: So, that is what I ended up with:
Object structure
typedef struct {
long id;
gpc_polygon gpc_obj;
} object;
Layer structure
typedef std::vector<muld*> layer;
Code for initializing geometry objects:
layer initializeLayer() {
//... database connection code
//find the count of objects in database
int count = ...
//helper object for loading from database
spatial_obj* sp_obj = NULL;
object* object_ptr = NULL;
layer myLayer;
myLayer.reserve(count);
int i = 0;
//... query database
while(db.Fetch()) {
id = db.GetLongData(0);
db.GetSDO_Object(&sp_obj); //load from database
object_ptr = new object;
object_ptr->id = id;
db.ConvertSpatial2GPC(sp_obj, &object_ptr->gpc_obj);
myLayer.push_back(object_ptr);
i++;
}
return layer;
}
Code for freeing layer:
void freeLayer(layer myLayer) {
for(std::vector<int>::size_type i = 0; i != myLayer.size(); i++) {
gpc_free_polygon(&myLayer[i]->gpc_obj);
delete myLayer[i];
}
}
Code for doing binary search:
I found out that the std::binary_search only returns whether it found or did not find the object. std::lower_bound() to the rescue!
//Create empty object for searching
object* searched_obj = new obj;
object* found_obj = NULL;
searched_obj->id = id;
layer::iterator it;
it = std::lower_bound(myLayer.begin(), myLayer.end(), searched_obj, obj_comparer);
if(it != kiht.end()) {
found_obj = *it;
if(found_obj->id != id) {
//Error!
}
} else {
//Error!
}
//Release memory
delete searched_obj;
Function for comparing objects
bool obj_comparer(object *a, object *b) {
return a->id < b->id;
}
I wrote a program to find duplicate entry in a table. I am a beginner in C++, hence I don't know how this program is working efficient. Is there any other idea to write this program? Here I have 3 tables (2D Vector), that they are 1)aRecord_arr 2)mainTable and 3)idxTable. idxtable is use to identify the keys to check duplicate entry. aRecord_arr table to be add in maintable. If it is already exist in maintable, it will show the error "Duplicate Entry". So Check this program, and give your suggestions.
typedef vector<string> rec_t;
typedef vector<rec_t> tab_t;
typedef vector<int> cn_t;
int main()
{
tab_t aRecord_arr= { {"a","apple","fruit"},
{"b","banana","fruit"} };
tab_t mainTable = { {"o","orange","fruit"},
{"p","pineapple","fruit"},
{"b","banana","fruit"},
{"m","melon","fruit"},
{"a","apple","fruit"},
{"g","guava","fruit"} };
tab_t idxTable = { {"code","k"},
{"name","k"},
{"category","n"}};
size_t Num_aRecords = aRecord_arr.size();
int idxSize = idxTable.size();
int mainSize = mainTable.size();
rec_t r1;
rec_t r2;
tab_t t1,t2;
cn_t idx;
for(int i=0;i<idxSize;i++)
{
if(idxTable[i][1]=="k")
{
idx.push_back(i);
}
}
for(size_t j=0;j<Num_aRecords;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r1.push_back(aRecord_arr[j][idx[id]]);
}
t1.push_back(std::move(r1));
}
for(int j=0;j<mainSize;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r2.push_back(mainTable[j][idx[id]]);
}
t2.push_back(std::move(r2));
}
for(size_t i=0;i<t1.size();i++)
{
for(size_t j=0;j<t2.size();j++)
{
if(t1[i]==t2[j])
{
cout<<"Duplicate Entry"<<endl;
exit(0);
}
}
}
}
If you want to avoid duplicate entries in an array, you should consider using a std::setinstead.
What you want is probably a std::map or a std::set
Don't reinvent the wheel, the STL is full of goodies.
You seem to be rooted in a weakly typed language - but C++ is strongly typed.
You will 'pay' the disadvantage of strong typing almost no matter what you do, but you almost painstakingly avoid the advantage.
Let me start with the field that always says 'fruit' - my suggestion is to make this an enum, like:
enum PlantType { fruit, veggie };
Second, you have a vector that always contain 3 strings, all with the same meaning. this seems to be a job for a struct, like:
struct Post {
PlantType kind;
char firstchar;
string name;
// possibly other characteristics
};
the 'firstchar' is probably premature optimization, but lets keep that for now.
Now you want to add a new Post, to an existing vector of Posts, like:
vector<Post> mainDB;
bool AddOne( const Post& p )
{
for( auto& pp : mainDB )
if( pp.name == p.name )
return false;
mainDB.push_back(p);
return true;
}
Now you can use it like:
if( ! AddOne( Post{ fruit, 'b', "banana" } ) )
cerr << "duplicate entry";
If you need speed (at the cost of memory), switch your mainDB to map, like:
map<string,Post> mainDB;
bool AddOne( const Post& p )
{
if( mainDB.find(p.name) != mainDB.end() )
return false;
mainDB[p.name]=p;
return true;
}
this also makes it easier (and faster) to find and use a specific post, like
cout << "the fruit is called " << mainDB["banana"].name ;
beware that the above will cause a runtime error if the post dont exists
As you can see, firstchar was never used, and could be omitted. std::map
has a hash-function-specialization for string keys, and it will probably be
orders of magnitude faster than anything you or I could whip up by hand.
All of the above assumed inclusion of the correct headers, and
using namespace std;
if you dont like using namespace, prepend std:: to all the right places
hope it helps :)
I'm probably trying to achieve the impossible, but StackExchange always surprises me, so please have a go at this:
I need to map a name to an integer. The names (about 2k) are unique. There will be no additions nor deletions to that list and the values won't change during runtime.
Implementing them as const int variables gives me compile-time checks for existence and type.
Also this is very clear and verbose in code. Errors are easily spotted.
Implementing them as std::map<std::string, int> gives me a lot of flexibility for building the names to look up with string manipulation. I may use this to give strings as parameters to functions which than can query the list for multiple values by appending pre-/suffixes to that string. I can also loop over several values by creating a numeral part of the key name from the loop variable.
Now my question is: is there a method to combine both advantages? The missing compile-time check (especially for key-existence) almost kills the second method for me. (Especially as std::map silently returns 0 if the key doesn't exist which creates hard to find bugs.) But the looping and pre-/suffix adding capabilities are so damn useful.
I would prefer a solution that doesn't use any additional libraries like boost, but please suggest them nevertheless as I might be able to re-implement them anyway.
An example on what I do with the map:
void init(std::map<std::string, int> &labels)
{
labels.insert(std::make_pair("Bob1" , 45 ));
labels.insert(std::make_pair("Bob2" , 8758 ));
labels.insert(std::make_pair("Bob3" , 436 ));
labels.insert(std::make_pair("Alice_first" , 9224 ));
labels.insert(std::make_pair("Alice_last" , 3510 ));
}
int main()
{
std::map<std::string, int> labels;
init(labels);
for (int i=1; i<=3; i++)
{
std::stringstream key;
key << "Bob" << i;
doSomething(labels[key.str()]);
}
checkName("Alice");
}
void checkName(std::string name)
{
std::stringstream key1,key2;
key1 << name << "_first";
key2 << name << "_last";
doFirstToLast(labels[key1.str()], labels[key2.str()]);
}
Another goal is that the code shown in the main() routine stays as easy and verbose as possible. (Needs to be understood by non-programmers.) The init() function will be code-generated by some tools. The doSomething(int) functions are fixed, but I can write wrapper functions around them. Helpers like checkName() can be more complicated, but need to be easily debuggable.
One way to implement your example is using an enum and token pasting, like this
enum {
Bob1 = 45,
Bob2 = 8758,
Bob3 = 436,
Alice_first = 9224,
Alice_last = 3510
};
#define LABEL( a, b ) ( a ## b )
int main()
{
doSomething( LABEL(Bob,1) );
doSomething( LABEL(Bob,2) );
doSomething( LABEL(Bob,3) );
}
void checkName()
{
doFirstToLast( LABEL(Alice,_first), LABEL(Alice,_last) );
}
Whether or not this is best depends on where the names come from.
If you need to support the for loop use-case, then consider
int bob[] = { 0, Bob1, Bob2, Bob3 }; // Values from the enum
int main()
{
for( int i = 1; i <= 3; i++ ) {
doSomething( bob[i] );
}
}
I'm not sure I understand all your requirements, but how about something like this, without using std::map.
I am assuming that you have three strings, "FIRST", "SECOND" and "THIRD" that you
want to map to 42, 17 and 37, respectively.
#include <stdio.h>
const int m_FIRST = 0;
const int m_SECOND = 1;
const int m_THIRD = 2;
const int map[] = {42, 17, 37};
#define LOOKUP(s) (map[m_ ## s])
int main ()
{
printf("%d\n", LOOKUP(FIRST));
printf("%d\n", LOOKUP(SECOND));
return 0;
}
The disadvantage is that you cannot use variable strings with LOOKUP. But now you can iterate over the values.
Maybe something like this (untested)?
struct Bob {
static constexpr int values[3] = { 45, 8758, 436 };
};
struct Alice {
struct first {
static const int value = 9224;
};
struct last {
static const int value = 3510;
};
};
template <typename NAME>
void checkName()
{
doFirstToLast(NAME::first::value, NAME::last::value);
}
...
constexpr int Bob::values[3]; // need a definition in exactly one TU
int main()
{
for (int i=1; i<=3; i++)
{
doSomething(Bob::values[i]);
}
checkName<Alice>();
}
Using enum you have both compile-time check and you can loop over it:
How can I iterate over an enum?
I have a need for unique reusable ids. The user can choose his own ids or he can ask for a free one. The API is basically
class IdManager {
public:
int AllocateId(); // Allocates an id
void FreeId(int id); // Frees an id so it can be used again
bool MarkAsUsed(int id); // Let's the user register an id.
// returns false if the id was already used.
bool IsUsed(int id); // Returns true if id is used.
};
Assume ids happen to start at 1 and progress, 2, 3, etc. This is not a requirement, just to help illustrate.
IdManager mgr;
mgr.MarkAsUsed(3);
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());
Would print
1
2
4
Because id 3 has already been declared used.
What's the best container / algorithm to both remember which ids are used AND find a free id?
If you want to know the a specific use case, OpenGL's glGenTextures, glBindTexture and glDeleteTextures are equivalent to AllocateId, MarkAsUsed and FreeId
My idea is to use std::set and Boost.interval so IdManager will hold a set of non-overlapping intervals of free IDs.
AllocateId() is very simple and very quick and just returns the left boundary of the first free interval. Other two methods are slightly more difficult because it might be necessary to split an existing interval or to merge two adjacent intervals. However they are also quite fast.
So this is an illustration of the idea of using intervals:
IdManager mgr; // Now there is one interval of free IDs: [1..MAX_INT]
mgr.MarkAsUsed(3);// Now there are two interval of free IDs: [1..2], [4..MAX_INT]
mgr.AllocateId(); // two intervals: [2..2], [4..MAX_INT]
mgr.AllocateId(); // Now there is one interval: [4..MAX_INT]
mgr.AllocateId(); // Now there is one interval: [5..MAX_INT]
This is code itself:
#include <boost/numeric/interval.hpp>
#include <limits>
#include <set>
#include <iostream>
class id_interval
{
public:
id_interval(int ll, int uu) : value_(ll,uu) {}
bool operator < (const id_interval& ) const;
int left() const { return value_.lower(); }
int right() const { return value_.upper(); }
private:
boost::numeric::interval<int> value_;
};
class IdManager {
public:
IdManager();
int AllocateId(); // Allocates an id
void FreeId(int id); // Frees an id so it can be used again
bool MarkAsUsed(int id); // Let's the user register an id.
private:
typedef std::set<id_interval> id_intervals_t;
id_intervals_t free_;
};
IdManager::IdManager()
{
free_.insert(id_interval(1, std::numeric_limits<int>::max()));
}
int IdManager::AllocateId()
{
id_interval first = *(free_.begin());
int free_id = first.left();
free_.erase(free_.begin());
if (first.left() + 1 <= first.right()) {
free_.insert(id_interval(first.left() + 1 , first.right()));
}
return free_id;
}
bool IdManager::MarkAsUsed(int id)
{
id_intervals_t::iterator it = free_.find(id_interval(id,id));
if (it == free_.end()) {
return false;
} else {
id_interval free_interval = *(it);
free_.erase (it);
if (free_interval.left() < id) {
free_.insert(id_interval(free_interval.left(), id-1));
}
if (id +1 <= free_interval.right() ) {
free_.insert(id_interval(id+1, free_interval.right()));
}
return true;
}
}
void IdManager::FreeId(int id)
{
id_intervals_t::iterator it = free_.find(id_interval(id,id));
if (it != free_.end() && it->left() <= id && it->right() > id) {
return ;
}
it = free_.upper_bound(id_interval(id,id));
if (it == free_.end()) {
return ;
} else {
id_interval free_interval = *(it);
if (id + 1 != free_interval.left()) {
free_.insert(id_interval(id, id));
} else {
if (it != free_.begin()) {
id_intervals_t::iterator it_2 = it;
--it_2;
if (it_2->right() + 1 == id ) {
id_interval free_interval_2 = *(it_2);
free_.erase(it);
free_.erase(it_2);
free_.insert(
id_interval(free_interval_2.left(),
free_interval.right()));
} else {
free_.erase(it);
free_.insert(id_interval(id, free_interval.right()));
}
} else {
free_.erase(it);
free_.insert(id_interval(id, free_interval.right()));
}
}
}
}
bool id_interval::operator < (const id_interval& s) const
{
return
(value_.lower() < s.value_.lower()) &&
(value_.upper() < s.value_.lower());
}
int main()
{
IdManager mgr;
mgr.MarkAsUsed(3);
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());
return 0;
}
It would be good to know how many ids you're supposed to keep track of. If there's only a hundred or so, a simple set would do, with linear traversal to get a new id. If it's more like a few thousands, then of course the linear traversal will become a performance killer, especially considering the cache unfriendliness of the set.
Personally, I would go for the following:
set, which helps keeping track of the ids easily O(log N)
proposing the new id as the current maximum + 1... O(1)
If you don't allocate (in the lifetime of the application) more than max<int>() ids, it should be fine, otherwise... use a larger type (make it unsigned, use a long or long long) that's the easiest to begin with.
And if it does not suffice, leave me a comment and I'll edit and search for more complicated solutions. But the more complicated the book-keeping, the longer it'll take to execute in practice and the higher the chances of making a mistake.
But I don't think you have to guarantee the id must starts from 1. You can just make sure the available id must be larger than all allocated ids.
Like if the 3 is registered first, then the next available id can just be 4. I don't think it is necessary to use 1.
I'm assuming that you want to be able to use all available values for the Id type and that you want to reuse freed Ids? I'm also assuming that you'll lock the collection if you're using it from more than one thread...
I'd create a class with a set to store the allocated ids, a list to store the free ids and a max allocated value to prevent me having to preload the free id list with every available id.
So you start off with an empty set of allocated ids and empty list of free ids and the max allocated as 0. You allocate, take the head of the free list if there is one, else take max, check it's not in your set of allocated ids as it might be if someone reserved it, if it is, increment max and try again, if not add it to the set and return it.
When you free an id you simply check it's in your set and if so push it on your free list.
To reserve an id you simply check the set and if not present add it.
This recycles ids quickly, which may or may not be good for you, that is, allocate(), free(), allocate() will give you the same id back if no other threads are accessing the collection.
Compressed vector. But I don't think any container would make noticeable difference.
Normally, i'd say stick to an simple implementation until you have an idea of which methods are used most. Premature tuning might prove wrong. Use the simple implementation, and log its use, then you can optimize from the functions that are used the most. No use in optimizing for quick removal or quick allocation if you only need a couple of hundred ids and a simple vector would be enough.
Similar to skwllsp, I'd keep track of the ranges that have not been allocated, but my methods are slightly different. The base container would be a map, with the key being the upper bound of the range and the value being the lower bound.
IdManager::IdManager()
{
m_map.insert(std::make_pair(std::numeric_limits<int>::max(), 1);
}
int IdManager::AllocateId()
{
assert(!m_map.empty());
MyMap::iterator p = m_map.begin();
int id = p->second;
++p->second;
if (p->second > p->first)
m_map.erase(p);
return id;
}
void IdManager::FreeId(int id)
{
// I'll fill this in later
}
bool IdManager::MarkAsUsed(int id)
{
MyMap::iterator p = m_map.lower_bound(id);
// return false if the ID is already allocated
if (p == m_map.end() || id < p->second || id > p->first)))
return false;
// first thunderstorm of the season, I'll leave this for now before the power glitches
}
bool IdManager::IsUsed(int id)
{
MyMap::iterator p = m_map.lower_bound(id);
return (p != m_map.end() && id >= p->second && id <= p->first);
}
So a friend pointed out that in this case a hash might be better. Most OpenGL programs don't use more than a few thousand ids so a hash with say 4096 slots is almost guaranteed to have only 1 or 2 entries per slot. There is some degenerate case where lots of ids might go in 1 slot but that's seriously unlikely. Using a hash would make AllocateID much slower but a set could be used for that. Allocating being slower is less important than InUse being fast for my use case.
Is there a way to find a nonexisting key in a map?
I am using std::map<int,myclass>, and I want to automatically generate a key for new items. Items may be deleted from the map in different order from their insertion.
The myclass items may, or may not be identical, so they can not serve as a key by themself.
During the run time of the program, there is no limit to the number of items that are generated and deleted, so I can not use a counter as a key.
An alternative data structure that have the same functionality and performance will do.
Edit
I trying to build a container for my items - such that I can delete/modify items according to their keys, and I can iterate over the items. The key value itself means nothing to me, however, other objects will store those keys for their internal usage.
The reason I can not use incremental counter, is that during the life-span of the program they may be more than 2^32 (or theoretically 2^64) items, however item 0 may theoretically still exist even after all other items are deleted.
It would be nice to ask std::map for the lowest-value non-used key, so i can use it for new items, instead of using a vector or some other extrnal storage for non-used keys.
I'd suggest a combination of counter and queue. When you delete an item from the map, add its key to the queue. The queue then keeps track of the keys that have been deleted from the map so that they can be used again. To get a new key, you first check if the queue is empty. If it isn't, pop the top index off and use it, otherwise use the counter to get the next available key.
Let me see if I understand. What you want to do is
look for a key.
If not present, insert an element.
Items may be deleted.
Keep a counter (wait wait) and a vector. The vector will keep the ids of the deleted items.
When you are about to insert the new element,look for a key in the vector. If vector is not empty, remove the key and use it. If its empty, take one from the counter (counter++).
However, if you neveer remove items from the map, you are just stuck with a counter.
Alternative:
How about using the memory address of the element as a key ?
I would say that for general case, when key can have any type allowed by map, this is not possible. Even ability to say whether some unused key exists requires some knowledge about type.
If we consider situation with int, you can store std::set of contiguous segments of unused keys (since these segments do not overlap, natural ordering can be used - simply compare their starting points). When a new key is needed, you take the first segment, cut off first index and place the rest in the set (if the rest is not empty). When some key is released, you find whether there are neighbour segments in the set (due to set nature it's possible with O(log n) complexity) and perform merging if needed, otherwise simply put [n,n] segment into the set.
in this way you will definitely have the same order of time complexity and order of memory consumption as map has independently on requests history (because number of segments cannot be more than map.size()+1)
something like this:
class TKeyManager
{
public:
TKeyManager()
{
FreeKeys.insert(
std::make_pair(
std::numeric_limits<int>::min(),
std::numeric_limits<int>::max());
}
int AlocateKey()
{
if(FreeKeys.empty())
throw something bad;
const std::pair<int,int> freeSegment=*FreeKeys.begin();
if(freeSegment.second>freeSegment.first)
FreeKeys.insert(std::make_pair(freeSegment.first+1,freeSegment.second));
return freeSegment.first;
}
void ReleaseKey(int key)
{
std:set<std::pair<int,int>>::iterator position=FreeKeys.insert(std::make_pair(key,key)).first;
if(position!=FreeKeys.begin())
{//try to merge with left neighbour
std::set<std::pair<int,int>>::iterator left=position;
--left;
if(left->second+1==key)
{
left->second=key;
FreeKeys.erase(position);
position=left;
}
}
if(position!=--FreeKeys.end())
{//try to merge with right neighbour
std::set<std::pair<int,int>>::iterator right=position;
++right;
if(right->first==key+1)
{
position->second=right->second;
FreeKeys.erase(right);
}
}
}
private:
std::set<std::pair<int,int>> FreeKeys;
};
Is there a way to find a nonexisting
key in a map?
I'm not sure what you mean here. How can you find something that doesn't exist? Do you mean, is there a way to tell if a map does not contain a key?
If that's what you mean, you simply use the find function, and if the key doesn't exist it will return an iterator pointing to end().
if (my_map.find(555) == my_map.end()) { /* do something */ }
You go on to say...
I am using std::map, and
I want to automatically generate a key
for new items. Items may be deleted
from the map in different order from
their insertion. The myclass items may, or may not be identical, so they can not serve as a key by themself.
It's a bit unclear to me what you're trying to accomplish here. It seems your problem is that you want to store instances of myclass in a map, but since you may have duplicate values of myclass, you need some way to generate a unique key. Rather than doing that, why not just use std::multiset<myclass> and just store duplicates? When you look up a particular value of myclass, the multiset will return an iterator to all the instances of myclass which have that value. You'll just need to implement a comparison functor for myclass.
Could you please clarify why you can not use a simple incremental counter as auto-generated key? (increment on insert)? It seems that there's no problem doing that.
Consider, that you decided how to generate non-counter based keys and found that generating them in a bulk is much more effective than generating them one-by-one.
Having this generator proved to be "infinite" and "statefull" (it is your requirement), you can create a second fixed sized container with say 1000 unused keys.
Supply you new entries in map with keys from this container, and return keys back for recycling.
Set some low "threshold" to react on key container reaching low level and refill keys in bulk using "infinite" generator.
The actual posted problem still exists "how to make efficient generator based on non-counter". You may want to have a second look at the "infinity" requirement and check if say 64-bit or 128-bit counter still can satisfy your algorithms for some limited period of time like 1000 years.
use uint64_t as a key type of sequence or even if you think that it will be not enough
struct sequence_key_t {
uint64_t upper;
uint64_t lower;
operator++();
bool operator<()
};
Like:
sequence_key_t global_counter;
std::map<sequence_key_t,myclass> my_map;
my_map.insert(std::make_pair(++global_counter, myclass()));
and you will not have any problems.
Like others I am having difficulty figuring out exactly what you want. It sounds like you want to create an item if it is not found. sdt::map::operator[] ( const key_type& x ) will do this for you.
std::map<int, myclass> Map;
myclass instance1, instance2;
Map[instance1] = 5;
Map[instance2] = 6;
Is this what you are thinking of?
Going along with other answers, I'd suggest a simple counter for generating the ids. If you're worried about being perfectly correct, you could use an arbitrary precision integer for the counter, rather than a built in type. Or something like the following, which will iterate through all possible strings.
void string_increment(std::string& counter)
{
bool carry=true;
for (size_t i=0;i<counter.size();++i)
{
unsigned char original=static_cast<unsigned char>(counter[i]);
if (carry)
{
++counter[i];
}
if (original>static_cast<unsigned char>(counter[i]))
{
carry=true;
}
else
{
carry=false;
}
}
if (carry)
{
counter.push_back(0);
}
}
e.g. so that you have:
std::string counter; // empty string
string_increment(counter); // now counter=="\x00"
string_increment(counter); // now counter=="\x01"
...
string_increment(counter); // now counter=="\xFF"
string_increment(counter); // now counter=="\x00\x00"
string_increment(counter); // now counter=="\x01\x00"
...
string_increment(counter); // now counter=="\xFF\x00"
string_increment(counter); // now counter=="\x00\x01"
string_increment(counter); // now counter=="\x01\x01"
...
string_increment(counter); // now counter=="\xFF\xFF"
string_increment(counter); // now counter=="\x00\x00\x00"
string_increment(counter); // now counter=="\x01\x00\x00"
// etc..
Another option, if the working set actually in the map is small enough would be to use an incrementing key, then re-generate the keys when the counter is about to wrap. This solution would only require temporary extra storage. The hash table performance would be unchanged, and the key generation would just be an if and an increment.
The number of items in the current working set would really determine if this approach is viable or not.
I loved Jon Benedicto's and Tom's answer very much. To be fair, the other answers that only used counters may have been the starting point.
Problem with only using counters
You always have to increment higher and higher; never trying to fill the empty gaps.
Once you run out of numbers and wrap around, you have to do log(n) iterations to find unused keys.
Problem with the queue for holding used keys
It is easy to imagine lots and lots of used keys being stored in this queue.
My Improvement to queues!
Rather than storing single used keys in the queue; we store ranges of unused keys.
Interface
using Key = wchar_t; //In my case
struct Range
{
Key first;
Key last;
size_t size() { return last - first + 1; }
};
bool operator< (const Range&,const Range&);
bool operator< (const Range&,Key);
bool operator< (Key,const Range&);
struct KeyQueue__
{
public:
virtual void addKey(Key)=0;
virtual Key getUniqueKey()=0;
virtual bool shouldMorph()=0;
protected:
Key counter = 0;
friend class Morph;
};
struct KeyQueue : KeyQueue__
{
public:
void addKey(Key)override;
Key getUniqueKey()override;
bool shouldMorph()override;
private:
std::vector<Key> pool;
friend class Morph;
};
struct RangeKeyQueue : KeyQueue__
{
public:
void addKey(Key)override;
Key getUniqueKey()override;
bool shouldMorph()override;
private:
boost::container::flat_set<Range,std::less<>> pool;
friend class Morph;
};
void morph(KeyQueue__*);
struct Morph
{
static void morph(const KeyQueue &from,RangeKeyQueue &to);
static void morph(const RangeKeyQueue &from,KeyQueue &to);
};
Implementation
Note: Keys being added are assumed to be key not found in queue
// Assumes that Range is valid. first <= last
// Assumes that Ranges do not overlap
bool operator< (const Range &l,const Range &r)
{
return l.first < r.first;
}
// Assumes that Range is valid. first <= last
bool operator< (const Range &l,Key r)
{
int diff_1 = l.first - r;
int diff_2 = l.last - r;
return diff_1 < -1 && diff_2 < -1;
}
// Assumes that Range is valid. first <= last
bool operator< (Key l,const Range &r)
{
int diff = l - r.first;
return diff < -1;
}
void KeyQueue::addKey(Key key)
{
if(counter - 1 == key) counter = key;
else pool.push_back(key);
}
Key KeyQueue::getUniqueKey()
{
if(pool.empty()) return counter++;
else
{
Key key = pool.back();
pool.pop_back();
return key;
}
}
bool KeyQueue::shouldMorph()
{
return pool.size() > 10;
}
void RangeKeyQueue::addKey(Key key)
{
if(counter - 1 == key) counter = key;
else
{
auto elem = pool.find(key);
if(elem == pool.end()) pool.insert({key,key});
else // Expand existing range
{
Range &range = (Range&)*elem;
// Note at this point, key is 1 value less or greater than range
if(range.first > key) range.first = key;
else range.last = key;
}
}
}
Key RangeKeyQueue::getUniqueKey()
{
if(pool.empty()) return counter++;
else
{
Range &range = (Range&)*pool.begin();
Key key = range.first++;
if(range.first > range.last) // exhausted all keys in range
pool.erase(pool.begin());
return key;
}
}
bool RangeKeyQueue::shouldMorph()
{
return pool.size() == 0 || pool.size() == 1 && pool.begin()->size() < 4;
}
void morph(KeyQueue__ *obj)
{
if(KeyQueue *queue = dynamic_cast<KeyQueue*>(obj))
{
RangeKeyQueue *new_queue = new RangeKeyQueue();
Morph::morph(*queue,*new_queue);
obj = new_queue;
}
else if(RangeKeyQueue *queue = dynamic_cast<RangeKeyQueue*>(obj))
{
KeyQueue *new_queue = new KeyQueue();
Morph::morph(*queue,*new_queue);
obj = new_queue;
}
}
void Morph::morph(const KeyQueue &from,RangeKeyQueue &to)
{
to.counter = from.counter;
for(Key key : from.pool) to.addKey(key);
}
void Morph::morph(const RangeKeyQueue &from,KeyQueue &to)
{
to.counter = from.counter;
for(Range range : from.pool)
while(range.first <= range.last)
to.addKey(range.first++);
}
Usage:
int main()
{
std::vector<Key> keys;
KeyQueue__ *keyQueue = new KeyQueue();
srand(time(NULL));
bool insertKey = true;
for(int i=0; i < 1000; ++i)
{
if(insertKey)
{
Key key = keyQueue->getUniqueKey();
keys.push_back(key);
}
else
{
int index = rand() % keys.size();
Key key = keys[index];
keys.erase(keys.begin()+index);
keyQueue->addKey(key);
}
if(keyQueue->shouldMorph())
{
morph(keyQueue);
}
insertKey = rand() % 3; // more chances of insert
}
}