STL map like data structure to allow searching based on two keys - c++

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.

Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.

There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

Related

C++ list sorted by multiple fields at once

I'm wondering if there's a good way to have a list of objects that is simultaneously sorted by two or more different criteria. So, for example, if I had a class:
class Person {
Person(const char *name, int age);
float age;
char * name;
bool compareAge(person_t a) {return age < b.age;}
bool compareName(person_t a) {return strcmp(name,a.name) < 0;}
};
And I have a million names. I'm wondering if there's any commonly available class that can have a list of people, such that I could quickly iterate and search over this list either alphabetically, or by age. So for example, I would like to be able to do the following without sorting between the lookups:
const std::multiset<Person, 2> mySet;
mySet.setSortCriteria(1, Person::compareName);
mySet.setSortCriteria(2, Person::compareAge);
populateAndSort(&mySet);
Person firstJohn = mySet.findByCriteria(1, Person("John",0));
Person firstTeen = mySet.findByCriteria(2, Person("",13));
I could do this with two separate sorted multisets of pointers to Person, but ideally, I'd like to maintain only a single list (so if I want to do lookups on say 20 criteria, I don't need to maintain 20 lists...). I have not found any references so far on a good way to do this, however, at the moment I'm somewhat new to C++ and there's a good chance I'm missing something.
There is no magic in our poor world... If you want the same list to be accessible according to different sort orders, you will need to maintain multiple indexes. That's the way databases work: you generally have a primary key which is supposed to be the native access mode, and alternate indexes that allow quick searches and ordering on the index fields. In the container world, the primary key would be the native order of the primary container, and the alternate indexes would be alternate maps where the key if the field(s) used for ordering and the value would be an iterator on the primary container - at least for containers that do not invalid iterators on a change, like lists and maps.
You can roll your own. There are few caveats, mainly correctly maintain all indexes on a modification of the primary container. Or as you were advised in comments just use Boost.

64bit array operation by C/C++

I have an efficiency critical application, where I need such an array-type data structure A. Its keys are 0, 1, 2,..., and its values are uint64_t distinct values. I need two constant operations:
1. Given i, return A[i];
2. Given val, return i such that A[i] == val
I prefer not to use hash table. Because I tried GLib GHashTable, it took around 20 mins to load 60 million values into the hash table (If I remove the insertion statement, it took only around 6 seconds). The time is not acceptable for my application. Or maybe somebody recommend other hash table libraries? I tried uthash.c, it crashed immediately.
I also tried SDArray, but it seems not the right one.
Does anybody know any data structure that would fulfill my requirements? Or any efficient hash table implementations? I prefer using C/C++.
Thanks.
In general, you need two hash tables for this task. As you know, hash tables give you a key look-up in expected constant time. Searching for a value requires iterating through the whole data structure, since information about the values isn't encoded in the hash look-up table.
Use two hash tables: One for key-value and one (reversed) for value-key look-up. In your particular case, the forward search can be done using a vector as long as your keys are "sequential". But this doesn't change the requirement for a data structure enabling fast reverse look-up.
Regarding the hash table implementation: In C++11, you have the new standard container std::unordererd_map available.
An implementation might look like this (of course this is tweakable, like introducing const-correctness, calling by reference etc.):
std::unordered_map<K,T> kvMap; // hash table for forward search
std::unordered_map<T,K> vkMap; // hash table for backward search
void insert(std::pair<K,T> item) {
kvMap.insert(item);
vkMap.insert(std::make_pair(item.second, item.first));
}
// expected O(1)
T valueForKey(K key) {
return kvMap[key];
}
// expected O(1)
K keyForValue(T value) {
return vkMap[value];
}
A clean C++11 implementation should "wrap" around the key-value hash map, so you have the "standard" interface in your wrapper class. Always keep the reverse map in sync with your forward map.
Regarding the creation performance: In most implementations, there is a way to tell the data structure how much elements are going to be inserted, called "reserve". For hash tables, this is a huge performance benefit, as dynamically resizing the data structure (which happens during insertions every now and then) completely re-structures the whole hash table, as it changes the hash function itself.
I would go for two vectors (assuming that your values are really distinct), as this is O(1) in access where map is O(log n) in access
vector<uint64_t> values;
vector<size_t> keys
values.reserve(maxSize); // do memory reservation first, so reallocation doesn't occur during reading of data
keys.reserve(maxSize); // do memory reservation first, so reallocation doesn't occur during reading of data
Then, when reading in data
values[keyRead] = data;
keys[valueRead] = key;
Reading information is then the same
data = values[currentKey];
key = keys[currentData];

Is std::map a good solution?

All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]
It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.

How to convert a struct property to a pointer reference in C++?

I have a DAG in a JSON format, where each node is an entry: it has a name, and two arrays. One array is for other nodes with arrows coming into it, another array for nodes that this node is directed towards (outgoing arrows).
So, for example:
{
'id': 'A',
'connected_from' : ['B','C'],
'connects_to' : ['D','E']
}
And I have a collection of these nodes, that all together form a DAG.
I'd like to map the nodes to a struct to hold these nodes, where the id is simply a string, and I'd like the arrays to be vectors of pointers of this struct:
struct node {
string id;
vector<node*> connected_from;
vector<node*> connected_to;
}
How do I convert the node entries as 'id' in the arrays of the JSON to a pointer to the correct struct holding that node?
One obvious approach is to build a map of key-value pairs, where key = id, value = the pointer to the struct with that id, and do a lookup - but is there a better way?
no, given only the information that you've provided there isn't a better way: you need to build a map.
however, for single letter id's the map can possibly take the form of a simple array with e.g. 26 entries for the English alphabet.
There's going to be some container object holding all the nodes (otherwise you're going to leak them.) You could always scan over the container to find the nodes. But this will be inefficient - O(N^2) while a map lookup will be O(N log N ).
Though if you store the objects in sorted order in the container (or use a sorted container) you can reduce both cases to O(N log N).
The constants will be different though, so for a small graph the scan may be faster.
I think your suggestion is fine... Map from ID to node. It's simple, intuitive and fast enough for practical purposes. Considering the data is being parsed from JSON, your storage and lookups are not going to significantly impact performance. If you're really concerned, then implement a Dictionary to replace your map.
In general terms, I always advocate the simplest, cleanest approach that gets the job done. Too many people obsess about memory or performance hits in algorithms, when the actual bottleneck in their code lies elsewhere.

Hash table with two keys

I have a large amount of data the I want to be able to access in two different ways. I would like constant time look up based on either key, constant time insertion with one key, and constant time deletion with the other. Is there such a data structure and can I construct one using the data structures in tr1 and maybe boost?
Use two parallel hash-tables. Make sure that the keys are stored inside the element value, because you'll need all the keys during deletion.
Have you looked at Bloom Filters? They aren't O(1), but I think they perform better than hash tables in terms of both time and space required to do lookups.
Hard to find why you need to do this but as someone said try using 2 different hashtables.
Just pseudocode in here:
Hashtable inHash;
Hashtable outHash;
//Hello myObj example!!
myObj.inKey="one";
myObj.outKey=1;
myObj.data="blahblah...";
//adding stuff
inHash.store(myObj.inKey,myObj.outKey);
outHash.store(myObj.outKey,myObj);
//deleting stuff
inHash.del(myObj.inKey,myObj.outKey);
outHash.del(myObj.outKey,myObj);
//findin stuff
//straight
myObj=outHash.get(1);
//the other way; still constant time
key=inHash.get("one");
myObj=outHash.get(key);
Not sure, thats what you're looking for.
This is one of the limits of the design of standard containers: a container in a sense "own" the contained data and expects to be the only owner... containers are not merely "indexes".
For your case a simple, but not 100% effective, solution is to have two std::maps with "Node *" as value and storing both keys in the Node structure (so you have each key stored twice). With this approach you can update your data structure with reasonable overhead (you will do some extra map search but that should be fast enough).
A possibly "correct" solution however would IMO be something like
struct Node
{
Key key1;
Key key2;
Payload data;
Node *Collision1Prev, *Collision1Next;
Node *Collision2Prev, *Collision2Next;
};
basically having each node in two different hash tables at the same time.
Standard containers cannot be combined this way. Other examples I coded by hand in the past are for example an hash table where all nodes are also in a doubly-linked list, or a tree where all nodes are also in an array.
For very complex data structures (e.g. network of structures where each one is both the "owner" of several chains and part of several other chains simultaneously) I even resorted sometimes to code generation (i.e. scripts that generate correct pointer-handling code given a description of the data structure).