I need to use a data structure which supports constant time lookups on average. I think that using a std::unordered_map is a good way to do it. My data is a "collection" of numbers.
|115|190|380|265|
These numbers do not have to be in a particular order. I need to have about O(1) time to determine whether or not a given number exists in this data structure. I have the idea of using a std::unordered_map, which is actually a hash table (am I correct?). So the numbers will be keys, and then I would just have dummy values.
So basically I first need to determine if the key matching a given number exists in the data structure, and I run some algorithm based on that condition. And independently of that condition I also want to update a particular key. Let's say 190, and I want to add 20 to it, so now the key would be 210.
And now the data structure would look like this:
|115|210|380|265|
The reason I want to do this is because I have a recursive algorithm which traverses a binary search tree. Each node has an int value, and two pointers to the left and right nodes. When a leaf node is reached, I need to create a new field in the "hash table" data structure holding the current_node->value. Then when I go back up the tree in the recursion, I need to successively add each of the node's value to the previous sum stored in the key. And the reason why my data structure (which I suggest should be a std::unordered_map) has multiple fields of numbers is because each one of them represents a unique path going from a leaf node up the tree to a certain node in the middle. I check if the sum of all the values of the nodes on the path from the leaf going up to a given node is equal to the value of that node. So basically into each key is added the current value of the node, storing the sum of all the nodes on that path. I need to scan that data structure to determine if any one of the fields or keys is equal to the value of the current node. Also I want to insert new values into the data structure in near constant time. This is for competitive programming, and I would hesitate to use a std::vector because looking up an element and inserting an element takes linear time, I think. That would screw up my time complexity. Maybe I should use another data structure other than a std::unordered_map?
You can use unordered_map::erase and unordered_map::insert to update a key. The average time complexity is O(1)(BTW, the worst is O(n)). If you are using C++17, you can also use unordered_map::extract to update a key. The time complexity is the same.
However, since you only need a set of number, I think unordered_set is more suitable for your algorithm.
#include <unordered_map>
#include <iostream>
int main()
{
std::unordered_map<int, int> m;
m[42]; // add
m[69]; // some
m[90]; // keys
int value = 90; // value to check for
auto it = m.find(90);
if (it != m.end()) {
m.erase(it); // remove it
m[value + 20]; // add an altered value
}
}
#include <unordered_map>
#include <string>
int main() {
// replace same key but other instance
std::unordered_map<std::string, int> eden;
std::string k1("existed key");
std::string k2("existed key");
const auto &[it, first] = eden.try_emplace(k1, 1);
if (!first) {
eden.erase(it);
eden.emplace_hint(it, k2, 123);
}
}
Since C++17, you can also use its extract function as follows:
std::unordered_map<int, int> map = make_map();
auto node = map.extract(some_key);
node.key() = new_key;
map.insert(std::move(node));
Related
Say I have a std::unordered_map<std::string, int> that represents a word and the number of times that word appeared in a book, and I want to be able to sort it by the value.
The problem is, I want the sorting to be stable, so that in case two items have equal value I want the one who got inserted first to the map to be first.
It is simple to implement it by adding addition field that will keep the time it got inserted. Then, create a comperator that uses both time and the value. Using simple std::sort will give me O(Nlog(N)) time complexity.
In my case, space is not an issue whenever time can be improved. I want to take advantage of it and do a bucket sorting. Which should give me O(N) time complexity. But when using bucket sorting, there is no comperator, when iterating the items in the map the order is not preserved.
How can I both make it stable and still keep the O(N) time complexity via bucket sorting or something else?
I guess that if I had some kind of hash map that preserves the order of insertion while iterating it, it would solve my issue.
Any other solutions with the same time complexity are acceptable.
Note - I already saw this and that and due to the fact that they are both from 2009 and that my case is more specific I think, I opened this question.
Here is a possible solution I came up with using an std::unordered_map and tracking the order of inserting using a std::vector.
Create a hash map with the string as key and count as value.
In addition, create a vector with iterators to that map type.
When counting elements, if the object is not yet in the map, add to both map and vector. Else, just increment the counter. The vector will preserve the order the elements got inserted to the map, and the insertion / update will still be in O(1) time complexity.
Apply bucket sort by iterating over the vector (instead of the map), this ensures the order is preserved and we'll get a stable sort. O(N)
Extract from the buckets to make a sorted array. O(N)
Implementation:
unordered_map<std::string, int> map;
std::vector<std::unordered_map<std::string,int>::iterator> order;
// Lets assume this is my string stream
std::vector<std::string> words = {"a","b","a" ... };
// Insert elements to map and the corresponding iterator to order
for (auto& word : words){
auto it = map.emplace(word,1);
if (!it.second){
it.first->second++;
}
else {
order.push_back(it.first);
}
max_count = std::max(max_count,it.first->second);
}
// Bucket Sorting
/* We are iterating over the vector and not the map
this ensures we are iterating by the order they got inserted */
std::vector<std::vector<std::string>> buckets(max_count);
for (auto o : order){
int count = o->second;
buckets[count-1].push_back(o->first);
}
std::vector<std::string> res;
for (auto it = buckets.rbegin(); it != buckets.rend(); ++it)
for (auto& str : *it)
res.push_back(str);
I have a situation where I have a container which needs to hold an ID (IDs are unique) and a data value. I also need to keep these IDs in an order. The tuple of these variables will be looked up by the ID, but then processed in order up to the found element, ie I don't always want to process the whole container. For this, I've got a simple solution of
// ordinal, { ID, data }
std::map<int64, pair<int64, data_t> >
Which I will first search for ID by iterating through and comparing a search value with the first field of the pair, giving me an iterator to walk up to, then I will process all elements up to this position. Is there a better way of doing this (by my count this is O(2n))?
You can swap ordinal and ID and store them in a map of maps:
// ID ordinal data
std::unordered_map<int64, std::map<int64, data_t>> container;
This will allow you to find an element with ID given and minimum possible ordinal in O(log N) time:
container[ID].begin(); // Has ID given, smallest possible ordinal and corresponding data
// Equal to container[ID].end() if not found
After that, you can compare the ordinal of the object found to the threshold given.
UP: Of course, if your IDs are unique, there's no need in the nested map: you can just use std::unordered_map<int64, std::pair<int64, data_t>>.
You could use Boost.Bimap if you want to index on values as well as keys. That way you can look up a pair in the map based on it's value. Without this or similar, this will have to be done by brute force (=> iterate over the map by hand).
Otherwise you could use std::find_if to help you find the pair with the ID you're looking for but it will be the same speed as iterating over the map.
If the ordinal is strictly for maintaining the order and there won't be any gaps, I'd do something simple like this:
int64_t givenID = whereToQuit;
std::vector< int64_t > ordinal_to_ID;
std::unordered_map< int64_t, data_t > data_map;
using datapair_t = std::pair< int64_t, data_t >;
void do_whatever( datapair_t );
bool done = false;
do
{
int64_t ID = ordinal_to_ID[ i ];
do_whatever( datapair_t( ID, data_map[ ID ]) );
done = ID == givenID;
}
while ( !done );
I have written a code for solving the following problem: We have a map<double,double> with (relatively) huge number of items. We want to merge the adjacent items in order to reduce the size of the map keeping a certain "loss factor" as low as possible.
To do so, I first populate a list containing adjacent iterators and the associated loss factor (let's say each list element has the following type:
struct myPair {
map<double,double>::iterator curr, next;
double loss;
myPair(map<double,double>::iterator c, map<double,double>::iterator n,
double l): curr(c), next(n), loss(l) {}
};
). This is done as follows:
for (map<double,double>::iterator it1 = myMap.begin(); it1 != --(myMap.end());
it1++) {
map<double,double>::iterator it2 = it1; it2++;
double l = computeLoss(it1,it2);
List.push(myPair(it1,it2,l));
}
Then, I find the list element corresponding to the lowest loss factor, erase the corresponding elements from the map and insert a new element (result of merging curr and next) in the map. Since this also changes the list elements corresponding to the element after next or before curr I update the corresponding entries and also the associated loss factor.
(I don't get into the details of how to implement the above efficiently but basically I am combining a double linked list and a heap).
While the erase operations should not invalidate the remaining iterators for some specific input instances of the program I get the double free or corruption error exactly at the point where I attempt to erase the elements from the map.
I tried to track this and it seems this happens when both first and second entries of the two map elements are very close (more precisely when the firsts of curr and next are very close).
A strange thing is that I put an assert while populating the list to ensure that in all entries curr and next are different and the same assert in the loop of removing elements. The second one fails!
I would appreciate if anyone can help me.
P.S. I am sorry for not being very precise but I wanted to keep the details as low as possible.
UPDATE: This is (a very simplified version of) how I erase the elements from the map:
while (myMap.size() > MAX_SIZE) {
t = list.getMin();
/* compute the merged version ... let's call the result as (a,b) */
myMap.erase(t.curr);
myMap.erase(t.next);
myMap.insert(pair<double,double>(a,b));
/* update the adjacent entries */
}
Stored iterators in myPair stay invalid after container modification. You should avoid such technique. Probably when you look into header file you will find some ready drafts for your task?
As mentioned already by the other people, it turns out that using double as the key of the map is problematic. In particular when the values are computed.
Hence, my solution was to use std::multimap instead of map (and then merge the elements with the same key just after populating the map). With this, for example even if a is very close to both keys of t.curr and t.next or any other element, for sure the insert operation creates a new element such that no existing iterator in the list would point to that.
I am working on a A* pathfinding algorithm in C++. I have the simple code below, now I need to find the object with lowest F. I know how to do this by iterating the vector and comparing it manualy, but I think there might be some other simplier way requering less code. Thanks for answers
struct Node
{
int f;
};
void func()
{
std::vector<Node> nodes;
//fill nodes with some objects
//now find Node object with smallest F
}
std::min_element and lambda comparator seems to be most terse. By the way, using plain vector seems to defeat the purpose of using fast search algorithm such as A*. It's ok during developement, but for the final version you should use fast priority queue, such as heap-based std::priority_queue.
You can keep a temp int variable, and in this variable record the smallest value index of vector. Everytime when you push the new value to vector, you can compare it with the vector value with the temp index and record the new smallest value index.
I've faced the same problem when I was implementing A* algorithm.
Look at boost::multi_index. It allows you to have a map with many keys.
So you can have both: "sorting" Nodes by F (when one of the keys is F) and fast finding by Node's position (when second key is 'Node') what is required in A*.
For 'F' as a key you will need to specify that this key is nonunique as there can be many items with the same F value (so for this key multi_index needs to behave as std::multimap). Otherwise for such situation multimap will behave like a map and nodes with the same 'F' will not be stored.
When multi_index is used, you can take a first item by key 'F' and this will be and item with lowest 'F' value. (AFAIR you can specify the sorting order)
I need to create a data structure that can access elements by a string key, or by their ordinal.
the class currently uses an array of nodes that contain the string key and a pointer to whatever element. This allows for O(n) looping through, or O(1) getting an element by ordinal, however the only way I've found to find an element by key is doing an O(n) loop and comparing keys until I find what I want, which is SLOW when there are 1000+ elements. is there a way to use the key to reference the pointer, or am I out of luck?
EDIT: the by ordinal is not so much important as the O(n) looping. This is going to be used as a base structure that will be inherited for use in other ways, for instance, if it was a structure of draw able objects, i'd want to be able to draw all of them in a single loop
You can use std::map for O(log n) searching speed. View this branch for more details. In this branch exactly your situation (fast retrieving values by string or/and ordinal key) is discussed.
Small example (ordinal keys are used, you can do similiar things with strings):
#include <map>
#include <string>
using std::map;
using std::string;
struct dummy {
unsigned ordinal_key;
string dummy_body;
};
int main()
{
map<unsigned, dummy> lookup_map;
dummy d1;
d1.ordinal_key = 10;
lookup_map[d1.ordinal_key] = d1;
// ...
unsigned some_key = 20;
//determing if element with desired key is presented in map
if (lookup_map.find(some_key) != lookup_map.end())
//do stuff
}
If you seldom modify your array you can just keep it sorted and use binary_search on it to find the element by key in O(logn) time (technically O(klogn) since you're using strings [where k is the average length of a key string]).
Of course this (just like using a map or unordered_map) will mess up your ordinal retrieval since the elements are going to be stored in sorted order not insertion order.
Use vector and map:
std::vector<your_struct> elements;
std::map<std::string, int> index;
Map allows you to retrieve the key's index in O(lg n) time, whereas the vector allows O(1) element access by index.
Use a hashmap