Hash table with two keys - c++

I have a large amount of data the I want to be able to access in two different ways. I would like constant time look up based on either key, constant time insertion with one key, and constant time deletion with the other. Is there such a data structure and can I construct one using the data structures in tr1 and maybe boost?

Use two parallel hash-tables. Make sure that the keys are stored inside the element value, because you'll need all the keys during deletion.

Have you looked at Bloom Filters? They aren't O(1), but I think they perform better than hash tables in terms of both time and space required to do lookups.

Hard to find why you need to do this but as someone said try using 2 different hashtables.
Just pseudocode in here:
Hashtable inHash;
Hashtable outHash;
//Hello myObj example!!
myObj.inKey="one";
myObj.outKey=1;
myObj.data="blahblah...";
//adding stuff
inHash.store(myObj.inKey,myObj.outKey);
outHash.store(myObj.outKey,myObj);
//deleting stuff
inHash.del(myObj.inKey,myObj.outKey);
outHash.del(myObj.outKey,myObj);
//findin stuff
//straight
myObj=outHash.get(1);
//the other way; still constant time
key=inHash.get("one");
myObj=outHash.get(key);
Not sure, thats what you're looking for.

This is one of the limits of the design of standard containers: a container in a sense "own" the contained data and expects to be the only owner... containers are not merely "indexes".
For your case a simple, but not 100% effective, solution is to have two std::maps with "Node *" as value and storing both keys in the Node structure (so you have each key stored twice). With this approach you can update your data structure with reasonable overhead (you will do some extra map search but that should be fast enough).
A possibly "correct" solution however would IMO be something like
struct Node
{
Key key1;
Key key2;
Payload data;
Node *Collision1Prev, *Collision1Next;
Node *Collision2Prev, *Collision2Next;
};
basically having each node in two different hash tables at the same time.
Standard containers cannot be combined this way. Other examples I coded by hand in the past are for example an hash table where all nodes are also in a doubly-linked list, or a tree where all nodes are also in an array.
For very complex data structures (e.g. network of structures where each one is both the "owner" of several chains and part of several other chains simultaneously) I even resorted sometimes to code generation (i.e. scripts that generate correct pointer-handling code given a description of the data structure).

Related

Can I reinterpret a memory mapped file of key-value pairs as a map in order to sort them?

I have a memory mapped file that contains key-value pairs. Both the key and value are uint32_t, and all the keys and values are stored in the file in binary, where a key immediately proceeds the value. The file contains only these pairs, no delimiters.
I want to be able to sort all of these key-value pairs by increasing key.
The following just compiled in my code:
struct FileAsMap { map<uint32_t, uint32_t> keyValueMap; };
const FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
but I don't really know what to do from here, since by definition the map container keeps a strict weak ordering of the pairs by key. If I just reinterpret the mapped file as a map, how can I get the pairs to order?
it's not an answer but explanations don't fit into comment limitations.
The keys in a map are usually unique (at least in std::map they are). But maps in general differ one from another in method they sort stored keys. For example std::map is based on a balanced binary tree with average complexity of retrieving a given key equal to O(ln(n)) where n is a number of elements in the map. Or e.g. std::unordered_map is a hashmap internally with the average access time = O(1). That is it looks for a key in constant time regardless of number of elements inside.
In any case all these data containers demand dedicated internal in-memory structure which practically never looks like a simple stream of key-value pairs. That's why I've told above in the first comment that it's almost impossible to reuse one of standard maps as a convenient data accessor for mmap-ed data w/o prior read and unpack the data stream.
But you can create your own map-like class which would iterate over data in mmap-ed area and would check in its operator[](size_t i) if a stored key matches the requested one. Iguess that a simplest implementation would take a single screen of code.
But beware: sequental scan is a relatively expensive operation, so if you got enough elements in the file, it could become unacceptable slow. In this case you'll need some optimized indexing. For example all keys are read in the beginning of processing and an indexing array is built. But all these questions heavily depend on task details, ao it's better to stop explanations now.
If you have any further questions feel free to ask. Of course a good question assumes that you have already studied the subject and now have encountered a particular problem that you can't solve yoursef
There are a lot of reasons why the answer is no. The two simplest are:
Maps are a structure that stores data in a form in which it's already sorted. Your data isn't already sorted, so it's simply not a map.
The map class has its own internal data structure that it uses to store maps. Unless your file replicates this internal structure perfectly (which it almost certainly can't since it likely includes pointers into memory) the map class will misunderstand the data in the file.
How did u serialize the data to the file?
Assuming that you serialized a struct consisting of maps, you'd de-serialize as below:
FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
Gives access to entire structure (blob).
(*fileAsMap).keyValueMap gives access to map.

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.
Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.
There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

How to convert a struct property to a pointer reference in C++?

I have a DAG in a JSON format, where each node is an entry: it has a name, and two arrays. One array is for other nodes with arrows coming into it, another array for nodes that this node is directed towards (outgoing arrows).
So, for example:
{
'id': 'A',
'connected_from' : ['B','C'],
'connects_to' : ['D','E']
}
And I have a collection of these nodes, that all together form a DAG.
I'd like to map the nodes to a struct to hold these nodes, where the id is simply a string, and I'd like the arrays to be vectors of pointers of this struct:
struct node {
string id;
vector<node*> connected_from;
vector<node*> connected_to;
}
How do I convert the node entries as 'id' in the arrays of the JSON to a pointer to the correct struct holding that node?
One obvious approach is to build a map of key-value pairs, where key = id, value = the pointer to the struct with that id, and do a lookup - but is there a better way?
no, given only the information that you've provided there isn't a better way: you need to build a map.
however, for single letter id's the map can possibly take the form of a simple array with e.g. 26 entries for the English alphabet.
There's going to be some container object holding all the nodes (otherwise you're going to leak them.) You could always scan over the container to find the nodes. But this will be inefficient - O(N^2) while a map lookup will be O(N log N ).
Though if you store the objects in sorted order in the container (or use a sorted container) you can reduce both cases to O(N log N).
The constants will be different though, so for a small graph the scan may be faster.
I think your suggestion is fine... Map from ID to node. It's simple, intuitive and fast enough for practical purposes. Considering the data is being parsed from JSON, your storage and lookups are not going to significantly impact performance. If you're really concerned, then implement a Dictionary to replace your map.
In general terms, I always advocate the simplest, cleanest approach that gets the job done. Too many people obsess about memory or performance hits in algorithms, when the actual bottleneck in their code lies elsewhere.

STL Map versus Static Array

I have to store information about contents in a lookup table such that it can be accessed very quickly.I might need to refer some of the elements in look up table recursively to get complete information about contents. What will be better data structure to use:
Map with one of parameter, which will be unique to all the entries in look up table, as key and rest of the information as value
Use static array for each unique entries and access them when needed according to key(which will be same as the one used in MAP).
I want my software to be robust as if we have any crash it will be catastrophic for my product.
It depends on the range of keys that you have.
Usually, when you say lookup table, you mean a smallish table which you can index directly ( O(1) ). As a dumb example, for a substitution cipher, you could have a char cipher[256] and simply index with the ASCII code of a character to get the substitution character. If the keys are complex objects or simply too many, you're probably stuck with a map.
You might also consider a hashtable (see unordered_map).
Reply:
If the key itself can be any 32-bit number, it wouldn't make sense to store a very sparse 4-billion element array.
If however your keys are themselves between say 0..10000, then you can have a 10000-element array containing pointers to your objects (or the objects themselves), with only 2000-5000 of your elements containing non-null pointers (or meaningful data, respectively). Access will be O(1).
If you can have large keys, then I'd probably go with the unordered_map. With a map of 5000 elements, you'd get O(log n) to mean around ~12 accesses, a hash table should be pretty much one or two accesses tops.
I'm not familiar with perfect hashes, so I can't advise about their implementation. If you do choose that, I'd be grateful for a link or two with ideas to keep in mind.
The lookup times in a std::map should be O=ln(n), with a linear search in a static array in the worst case O=n.
I'd strongly opt for a std::map even if it has a larger memory footprint (which should not matter, in the most cases).
Also you can make "maps of maps" or even deeper structures:
typedef std::map<MyKeyType, std::map<MyKeyType, MyValueType> > MyDoubleMapType;

How to implement an associative array/map/hash table data structure (in general and in C++)

Well I'm making a small phone book application and I've decided that using maps would be the best data structure to use but I don't know where to start. (Gotta implement the data structure from scratch - school work)
Tries are quite efficient for implementing maps where the keys are short strings. The wikipedia article explains it pretty well.
To deal with duplicates, just make each node of the tree store a linked list of duplicate matches
Here's a basic structure for a trie
struct Trie {
struct Trie* letter;
struct List *matches;
};
malloc(26*sizeof(struct Trie)) for letter and you have an array. if you want to support punctuations, add them at the end of the letter array.
matches can be a linked list of matches, implemented however you like, I won't define struct List for you.
Simplest solution: use a vector which contains your address entries and loop over the vector to search.
A map is usually implemented either as a binary tree (look for red/black trees for balancing) or as a hash map. Both of them are not trivial: Trees have some overhead for organisation, memory management and balancing, hash maps need good hash functions, which are also not trivial. But both structures are fun and you'll get a lot of insight understanding by implementing one of them (or better, both :-)).
Also consider to keep the data in the vector list and let the map contain indices to the vector (or pointers to the entries): then you can easily have multiple indices, say one for the name and one for the phone number, so you can look up entries by both.
That said I just want to strongly recommend using the data structures provided by the standard library for real-world-tasks :-)
A simple approach to get you started would be to create a map class that uses two vectors - one for the key and one for the value. To add an item, you insert a key in one and a value in another. To find a value, you just loop over all the keys. Once you have this working, you can think about using a more complex data structure.