Why is my std::unordered_map access time not constant - c++

I wrote some code to test my unordered map performance with a 2 component vector as a key.
std::unordered_map<Vector2i, int> m;
for(int i = 0; i < 1000; ++i)
for(int j = 0; j < 1000; ++j)
m[Vector2i(i,j)] = i*j+27*j;
auto found = m.find(Vector2i(0,5));
std::cout << clock.getElapsedTime().asMicroseconds() << std::endl;
output for the code above: 56 (microseconds)
When I replace 1000 in the for loops by 100 the outputs is 2 (microseconds)
Isn't the time supposed to be constant ?
hash function for my Vector2i:
namespace std
struct hash<Vector2i>
std::size_t operator()(const Vector2i& k) const
using std::size_t;
using std::hash;
using std::string;
return (hash<int>()(k.x)) ^ (hash<int>()(k.y) << 1);
I added this code to count the collisions after the for loop:
for (size_t bucket = 0; bucket != m.bucket_count(); ++bucket)
if (m.bucket_size(bucket) > 1)
With 100*100 elements: collisions = 256
1000*1000 elements: collisions = 2048

A hash table guarantees constant amortized time. If the hash table is well balanced (i.e., the hash function is good), then most elements will be evenly distributed. However, if the hash function is not so good, you may have lots of collisions, in which case to access an element you'd need to traverse usually a linked list (where you store the elements that collided). So make sure first the load factor and hash function are OK in your case. Lastly, make sure you compiler your code in release mode, with optimizations turned on (e.g. -O3 for g++/clang++).
I've been interested in optimizing "renumbering" algorithms that can relabel an arbitrary array of integers with duplicates into labels starting from 1. Sets and maps are too slow for what I've been trying to do, as are sorts. Is there a data structure that only remembers if a number has been seen or not reliably? I was considering experimenting with a bloom filter, but I have >12M integers and the target performance is faster than a good hashmap. Is this possible?
Here's a simple example pseudo-c++ algorithm that would be slow:
// note: all elements guaranteed > 0
std::vector<uint64_t> arr = { 21942198, 91292, 21942198, ... millions more };
std::unordered_map<uint64_t, uint64_t> renumber;
uint64_t next_label = 1;
for (uint64_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
if (renumber[elem]) {
arr[i] = renumber[elem];
else {
renumber[elem] = next_label;
arr[i] = next_label;
Example input/output:
{ 12, 38, 1201, 120, 12, 39, 320, 1, 1 }
{ 1, 2, 3, 4, 1, 5, 6, 7, 7 }
Your algorithm is not bad, but the appropriate data structure to use for the map is a hash table with open addressing.
As explained in this answer, std::unordered_map can't be implemented that way: https://stackoverflow.com/a/31113618/5483526
So if the STL container is really too slow for you, then you can do better by making your own.
Note, however, that:
90% of the time, when someone complains about STL containers being too slow, they are running a debug build with optimizations turned off. Make sure you are running a release build compiled with optimizations on. Running your code on 12M integers should take a few milliseconds at most.
You are accessing the map multiple times when only once is required, like this:
uint64_t next_label = 1;
for (size_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
uint64_t &label = renumber[elem];
if (!label) {
label = next_label++;
arr[i] = label;
Note that the unordered_map operator [] returns a reference to the associated value (creating it if it doesn't exist), so you can test and modify the value without having to search the map again.
Updated with bug fix
First, anytime you experience "slowness" with a std:: collection class like vector or map, just recompile with optimizations (release build). There is usually a 10x speedup.
Now to your problem. I'll show a two-pass solution that runs in O(N) time. I'll leave it as an exercise for you to convert to a one-pass solution. But I'll assert that this should be fast enough, even for vectors with millions of items.
First, declare not one, but two unordered maps:
std::unordered_map<uint64_t, uint64_t> element_to_label;
std::unordered_map<uint64_t, std::pair<uint64_t, std::vector<uint64_t>>> label_to_elements;
The first map, element_to_label maps an integer value found in the original array to it's unique label.
The second map, label_to_elements maps to both the element value and the list of indices that element occurs in the original array.
Now to build these maps:
uint64_t next_label = 1;
for (size_t index = 0; index < arr.size(); index++)
const uint64_t elem = arr[index];
auto itor = element_to_label.find(elem);
if (itor == element_to_label.end())
// new element
element_to_label[elem] = next_label;
auto &p = label_to_elements[next_label];
p.first = elem;
// existing element
uint64_t label = itor->second;
When the above code runs, it's built up a database all values in the array, their labels, and indices where they occur.
So now to renumber the array such that all elements are replaced with their smaller label value:
for (auto itor = label_to_elements.begin(); itor != label_to_elements.end(); itor++)
uint64_t label = itor->first;
auto& p = itor->second;
uint64_t elem = p.first; // technically, this isn't needed. It's just useful to know which element value we are replacing from the original array
const auto& vec = p.second;
for (size_t j = 0; j < vec.size(); j++)
size_t index = vec[j];
arr[index] = label;
Notice where I assign variables by reference with the & operator to avoid making an expensive copy of any value in the maps.
So if your original vector or array was this:
{ 100001, 2000002, 300003, 400004, 400004, 300003, 2000002, 100001 };
Then the application of labels would render the array as this:
And what's nice you still have a quick O(1) look operator to map any label in that set back to its original element value using label_to_elements

Vector of set insert elements

I'm trying to write a function which will return vector of set type string which represent members of teams.
A group of names should be classified into teams for a game. Teams should be the same size, but this is not always possible unless n is exactly divisible by k. Therefore, they decided that the first mode (n, k) teams have n / k + 1 members, and the remaining teams have n / k members.
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <list>
typedef std::vector<std::set<std::string>>vek;
vek Distribution(std::vector<std::string>names, int k) {
int n = names.size();
vek teams(k);
int number_of_first = n % k;
int number_of_members_first = n / k + 1;
int number_of_members_remaining = n / k;
int l = 0;
int j = 0;
for (int i = 1; i <= k; i++) {
if (i <= number_of_first) {
int number_of_members_in_team = 0;
while (number_of_members_in_team < number_of_members_first) {
else {
int number_of_members_in_team = 0;
while (number_of_members_in_team < number_of_members_remaining) {
return teams;
int main ()
for (auto i : Distribution({"Damir", "Ana", "Muhamed", "Marko", "Ivan",
"Mirsad", "Nikolina", "Alen", "Jasmina", "Merima"
}, 3)) {
for (auto j : i)
std::cout << j << " ";
std::cout << std::endl;
return 0;
OUTPUT should be:
Damir Ana Muhamed Marko
Ivan Mirsad Nikolina
Alen Jasmina Merima
Ana Damir Marko Muhamed
Ivan Mirsad Nikolina
Alen Jasmina Merima
Could you explain me why names are not printed in the right order?
teams being a std::vector<...> supports random access via an index.
auto & team_i = teams[i]; (0 <= i < teams.size()), will give you an element of the vector. team_i is a reference to type std::set<std::list<std::string>>.
As a std::set<...> does not support random access via an index, you will need to access the elements via iterators (begin(), end() etc.), e.g.: auto set_it = team_i.begin();. *set_it will be of type std::list<std::string>.
Since std::list<...> also does not support random access via an index, again you will need to access it via iterators, e.g.: auto list_it = set_it->begin();. *list_it will be of type std::string.
This way it is possible to access every set in the vector, every list in each set, and every string in each list (after you have added them to the data structure).
However - using iterators with std::set and std::list is not as convenient as using indexed random access with std::vector. std::vector has additional benefits (simple and efficient implementation, continous memory block).
If you use std::vectors instead of std::set and std::list, vek will be defined as:
typedef std::vector<std::vector<std::vector<std::string>>> vek;
std::list being a linked list offers some benefits (like being able to add an element in O(1)). std::set guarentees that each value is present once.
But if you don't really need these features, you could make you code simpler (and often more efficient) if you use only std::vectors as your containers.
Note: if every set will ever contain only 1 list (of strings) you can consider to get rid of 1 level of the hirarchy, I.e. store the lists (or vectors as I suggested) directly as elements of the top-level vector.
Since the question was changed, here's a short update:
In my answer above, ignore all the mentions of the std::list. So when you iterate on the set::set the elements are already std::strings.
The reason the names are not in the order you expect:
std::set keeps the elements sorted, and when you iterate it you will get the elements by that sorting order. See the answer here: Is the std::set iteration order always ascending according to the C++ specification?. Your set contains std::strings and the default sort order for them is alphabetically.
Using std::vector instead of std::set like I proposed above, will get you the result you wanted (std::vector is not sorted automatically).
If you want to try using only std::vector:
Change vek to:
typedef std::vector<std::vector<std::string>>vek;
And replace the usage of insert (to add an element to the set) with push_back to do the same for a vector.

Vector performance suffering

I've been working on state space exploration and was originally using a map to store the assignment of the world states like map<Variable *, int>, where variables are objects in the world with a domain from 0 to n where n is finite. The implementation was extremely quick for performance, but I noticed that it does not scale well with the size of the state space. I changed the states to use vector<int> instead, where I use the id of a variable to find its index in the vector. Memory usage improved greatly, but the efficiency of the solver has tanked (gone from <30 seconds to 400+). The only code that I modified was generating the states and validating if the state is the goal. I can't figure out why using a vector has degraded performance, especially since the vector operations should only take linear time at worst.
Originally this is was how I generated nodes:
State * SuccessorGen::generate_successor(const Operator &op, map<Variable *, int> &var_assignment){
map<Variable *, int> values;
values.insert(var_assignment.begin(), var_assignment.end());
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
values[eff_it->var] = eff_it->after;
return new State(values);
And in my new implementation:
State* SuccessorGen::generate_successor(const Operator &op, const vector<int> &assignment){
vector<int> child;
child = assignment;
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
Variable *v = eff_it->var;
int id = v->get_id();
child[id] = eff_it->after;
return new State(child);
(The goal checking is similar, just looping over the goal assignment instead of operator effects.)
Are these vector operations really that much slower than using a map? Is there an equally efficient STL container I can use that has a lower overhead? The number of variables is relatively small (<50) and the vector never needs to be resized or modified after the for loop.
I tried timing one loop through all the operators to see timing comparisons, with the effect list and assignment the vector version runs one loop in 0.3 seconds, while the map version is a little over 0.4 seconds. When I comment that section out the map was about the same, yet the vector jumped up to closer to 0.5 seconds. I added child.reserve(assignment.size()) but that did not make any change.
Edit 2:
From user63710's answer, I've also been digging through the rest of the code and noticed something really strange going on in the heuristic calculation. The vector version works fine, but for the map I use this line Node *n = new Node(i, transition.value, label_cost); open_list.push(n);, but once the loop finishes filling the queue the node gets totally screwed up. Nodes are a simple struct as:
struct Node{
// Source Value, Destination Value
int from;
int to;
int distance;
Node(int &f, int &t, int &d) : from(f), to(t), distance(d){}
Instead of having from, to, distance, it replaces from and to with id with some random number, and that search does not do what it should and is returning much faster then it should. When I tweak the map version to convert the map to a vector and run this:
Node n(i, transition.value, label_cost); open_list.push(n);
the performance is about equal to that of the vector. So that fixes my main issue, but this leaves me wondering why using Node *n gets this behaviour opposed to Node n()?
If as you say, the sizes of these structures are fairly small (~50 elements), I have to think that the issue is somewhere else. At least, I don't think it involves the memory accesses or allocation of the vector/map.
Some example code I made to test: Map version:
unique_ptr<map<int, int>> make_successor_map(const vector<int> &ids,
const map<int, int> &input)
auto new_map = make_unique<map<int, int>>(input.begin(), input.end());
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_map)[ids[i]], (*new_map)[i]);
return new_map;
int main()
auto a_map = make_unique<map<int, int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
a_map->insert({i, rand()});
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
auto temp_map = make_successor_map(ids, *a_map);
swap(temp_map, a_map);
cout << a_map->begin()->second << endl;
Vector version:
unique_ptr<vector<int>> make_successor_vec(const vector<int> &ids,
const vector<int> &input)
auto new_vec = make_unique<vector<int>>(input);
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_vec)[ids[i]], (*new_vec)[i]);
return new_vec;
int main()
auto a_vec = make_unique<vector<int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
auto temp_vec = make_successor_vec(ids, *a_vec);
swap(temp_vec, a_vec);
cout << *a_vec->begin() << endl;
The map version takes around 15 seconds to run on my old Core 2 Duo T9600, and the vector version takes 0.406 seconds. Both we're compiled on G++ 4.9.2 with g++ -O3 --std=c++1y. So if your code takes 0.4s per iteration (note that it took my example code 0.4s for 1 million calls), then I'm really thinking your problem is somewhere else.
That's not to say you aren't having performance decreases due to switching from map->vector, but that the code you posted doesn't show much reason for that to happen.
The problem is that you create vectors without reserving space. Vectors store elements contiguously. That ensures constant access to elements.
So everytime you add an item to the vector (for example via your inserter), the vector has to reallocate more space and eventuelly move all the existing elements to a reallocated memory location. This causes slowdown and considerable heap fragmentation.
The solution to this is to reserve() elements if you know in advance how many elements you'll have. Or if you don't reserve() larger chunks and compare size() and capacity() to check if it's time to reserve more.

How to get random and unique values from a vector?

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Unique random numbers in O(1)?
Unique random numbers in an integer array in the C programming language
I have a std::vector of unique elements of some undetermined size. I want to fetch 20 unique and random elements from this vector. By 'unique' I mean that I do not want to fetch the same index more than once. Currently the way I do this is to call std::random_shuffle. But this requires me to shuffle the entire vector (which may contain over 1000 elements). I don't mind mutating the vector (I prefer not to though, as I won't need to use thread locks), but most important is that I want this to be efficient. I shouldn't be shuffling more than I need to.
Note that I've looked into passing in a partial range to std::random_shuffle but it will only ever shuffle that subset of elements, which would mean that the elements outside of that range never get used!
Help is appreciated. Thank you!
Note: I'm using Visual Studio 2005, so I do not have access to C++11 features and libraries.
You can use Fisher Yates http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
The Fisher–Yates shuffle (named after Ronald Fisher and Frank Yates), also known as the Knuth shuffle (after Donald Knuth), is an algorithm for generating a random permutation of a finite set—in plain terms, for randomly shuffling the set. A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cycles of length n instead. Properly implemented, the Fisher–Yates shuffle is unbiased, so that every permutation is equally likely. The modern version of the algorithm is also rather efficient, requiring only time proportional to the number of items being shuffled and no additional storage space.
The basic process of Fisher–Yates shuffling is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left. What the specific algorithm provides is a way of doing this numerically in an efficient and rigorous manner that, properly done, guarantees an unbiased result.
I think this pseudocode should work (there is a chance of an off-by-one mistake or something so double check it!):
std::list chosen; // you don't have to use this since the chosen ones will be in the back of the vector
for(int i = 0; i < num; ++i) {
int index = rand_between(0, vec.size() - i - 1);
swap(vec[index], vec[vec.size() - i - 1]);
You want a random sample of size m from an n-vector:
Let rand(a) return 0..a-1 uniform
for (int i = 0; i < m; i++)
X[0..m-1] is now a random sample.
Use a loop to put random index numbers into a std::set and stop when the size() reaches 20.
std::set<int> indexes;
std::vector<my_vector::value_type> choices;
int max_index = my_vector.size();
while (indexes.size() < min(20, max_index))
int random_index = rand() % max_index;
if (indexes.find(random_index) == indexes.end())
The random number generation is the first thing that popped into my head, feel free to use something better.
#include <iostream>
#include <vector>
#include <algorithm>
template<int N>
struct NIntegers {
int values[N];
template<int N, int Max, typename RandomGenerator>
NIntegers<N> MakeNRandomIntegers( RandomGenerator func ) {
NIntegers<N> result;
for(int i = 0; i < N; ++i)
result.values[i] = func( Max-i );
std::sort(&result.values[0], &result.values[0]+N);
for(int i = 0; i < N; ++i)
result.values[i] += i;
return result;
Use example:
// use a better one:
int BadRandomNumberGenerator(int Max) {
return Max>4?4:Max/2;
int main() {
NIntegers<100> result = MakeNRandomIntegers<100, 500>( BadRandomNumberGenerator );
for (int i = 0; i < 100; ++i) {
std::cout << i << ":" << result.values[i] << "\n";
make each number 1 smaller in max than the last. Sort them, then bump up each value by the number of integers before it.
template stuff is just trade dress.

How to find first non-repeating element?

How to find first non-repeating element in an array.
Provided that you can only use 1 bit for every element of the array and time complexity should be O(n) where n is length of array.
Please make sure that I somehow imposed constraint on memory requirements. It is also possible that it can not be done with just an extra bit per element of the string. Also please let me know if it is possible or not?
I would say there is no comparison based algorithm, that can do it in O(n). As you have to compare the the first element of the array with all others, the 2nd with all except the first, the 3rd with all except the first = Sum i = O(n^2).
(But that does not necessarily mean that there is no faster algorithm, see sorting: There is a proof that you cant sort fast than O(n log n) if you are comparison based - and there is indeed one faster: Bucket Sort, which can do it in O(n)).
EDIT: In one of the other comments I said something about hash functions. I checked some facts about it, and here are the hashmap approach thoughts:
Obvious approach is (in Pseudocode):
for (i = 0; i < maxsize; i++)
count[i] = 0;
for (i = 0; i < maxsize; i++) {
h = hash(A[i]);
first = -1;
for (i = 0; i < maxsize; i++)
if (count[i] == 0) {
first = i;
for (i = 0; hash(A[i]) != first; i++) ;
printf("first unique: " + A[i]);
There are some caveats:
How to get hash. I did some research on perfect hash functions. And indeed you can generate one in O(n). (Optimal algorithms for minimal perfect hashing by George Havas et al. - Not sure how good this paper is, as it claims as Time Limit O(n) but speaks from non linear space limit (which is plan an error, I hope I am not the only seeing the flaw in the this, but according to all theorical computer science I know off time is an upper border for space (as you dont have time to write in more space)). But I believe them when they say it is possible in O(n).
The additional space - here I dont see a solution. Above papers cites some research that says that you need 2.7 bits for the perfect hash function. With the additional count array (which you can shorten to the states: Empty + 1 Element + More than 1 Element) you need 2 additional bits per element (1.58 if you assume you can it somehow combine with the above 2.7), which sums up to additional 5 bits.
Here I'm just taking one assumption that the string is Character String, just containing small alphabets, so that I can use one Integer (32 bit) so that with 26 alphabets it will be sufficient to take one bit per alphabet. Earlier I thought to take an array of 256 elements but then it will have 256*32 bits in total. 32 bits per element. But finally I found that I will be unable to do it without one more variable. So the solution is like this with just one integer (32 bits) for 26 alphabets:
int print_non_repeating(char* str)
int bitmap = 0, bitmap_check = 0;
int length = strlen(str);
for(int i=0;i<len;i++)
if(bitmap & 1<<(str[i] - 'a'))
bitmap_check = bitmap_check | ( 1 << (str[i] - 'a');
bitmap = bitmap | (1 << str[i] - 'a');
bitmap = bitmap ^ bitmap_check;
i = 0;
if(bitmap != 0)
while(!bitmap & (1<< (str[i])))
return 1;
return 0;
You can try doing a modified bucketsort as exemplified below. However, you need to know the max value in the array passed into the firstNonRepeat method. So this runs at O(n).
For comparison based methods, the theoretical fastest (at least in terms of sorting) is O(n log n). Alternatively, you can even use modified versions of radix sort to accomplish this.
public class BucketSort{
//maxVal is the max value in the array
public int firstNonRepeat(int[] a, int maxVal){
int [] bucket=new int[maxVal+1];
for (int i=0; i<bucket.length; i++){
for (int i=0; i<a.length; i++){
if(bucket[a[i]] == 0) {
} else {
return bucket[a[i]];
This code finds the first repeating element. havent figured out yet if in the same for loop if it is possible to find the non-repeating element without introducing another for (to keep the code O(n)). Other answers suggest bubble sort which is O(n^2)
#include <iostream>
using namespace std;
#define max_size 10
int main()
int numbers[max_size] = { 1, 2, 3, 4, 5, 1, 3, 4 ,2, 7};
int table[max_size] = {0,0,0,0,0,0,0,0,0,0};
int answer = 0, j=0;
for (int i = 0; i < max_size; i++)
j = numbers[i] %max_size;
if(table[j] >1)
answer = 1;
std::cout << "answer = " << answer ;