How to efficiently look up elements in a large vector - c++

I have a vector<unsigned> of size (90,000 * 9,000). I need to find many times whether an element exists in this vector or not?
For doing so, I stored the vector in a sorted form using std::sort() and then looked up elements in the vector using std::binary_search(). However on profiling using perf I find that looking up elements in vector<unsigned> is the slowest operation.
Can someone suggest some data-structure in C/C++ which I can use to efficiently look up elements in a vector of (90,000 * 9,000) elements.
I perform insertion (bulk-insertion) only once. The rest of the times I perform only lookups, so the main overhead here is because of lookups.

You've got 810 million values out of 4 billion possible values (assuming 32 bits unsigned). That's 1/5th of the total range, and uses 3.2 GB. This means you're in fact better of with a std::vector<bool> with 4 billion bits. This gives you O(1) lookup in less space (0.5 GB).
(In theory, unsigned could be 16 bits. unsigned long is at least 32 bits, std::uint32_t might be what you want)

Depending on the actual data structure of the vector the contains operation may take an O(n) or O(1). Usually, it's O(N) if vector is backed by either associative array or linked list, in this case contains will be a full scan in the worst case scenario. You have mitigated a full scan by ordering and using binary search, which is O(log (N)). Log N is pretty good complexity with only O(1) being better. So your choice is either:
Cache look up result for the items, this might be a good compromise if you have many repetitions of the same element
Replace vector with another data structure with efficient contains operation such as the one based on a hashtable or set. Note you may loose other features, such as ordering of items
Use two data structures, one for contains operations and original vector for whatever you use it for
Use a third data structure that offers a compromise, for example a data structure that work well with bloom filter

However on profiling using perf I find that looking up elements in
vector is the slowest operation.
That is half of the information you need, the other half being "how fast is it compared to other algorithms/containers"? Maybe using std::vector<> is actually the fastest, or maybe its the slowest. To find you'll have to benchmark/profile a few different designs.
For example, the following are very naive benchmarks using random integers on 1000x9000 sized containers (I would get seg-faults on larger sizes for the maps, assumably a limit of 32-bit memory).
If you need a count of non-unique integers:
std::vector<unsigned> = 500 ms
std::map<unsigned, unsigned> = 1700 ms
std::unordered_map<unsigned, unsigned> = 3700 ms
If you just need to test for the presence of unique integers:
std::vector<bool> = 15 ms
std::bitset<> = 50 ms
std::set<unsigned> = 350 ms
Note that we're not too interested in the exact values but rather the relative comparisons between containers. std::map<> is relatively slow which is not surprising given the number of dynamic allocations and non-locality of the data involved. The bitsets are by far the fastest but don't work if need the counts of non-unique integers.
I would suggest doing a similar benchmark using your exact container sizes and contents, both of which may well affect the benchmark results. It may turn out that std::vector<> may be the best solution after all but now you have some data to back up that design choice.

If you do not need iterate through the collection (in a sorted manner) since c++11 you could use std::unordered_set<yourtype> all you need to do is to provide the collection way of getting hashing and equality information for yourtype. The time of accessing element of the collection is here amortised O(1), unlike sorted vector where it's O(log(n)).

Related

C++ Find in a vector of <int, pair>

So previously I only had 1 key I needed to look up, so I was able to use a map:
std::map <int, double> freqMap;
But now I need to look up 2 different keys. I was thinking of using a vector with std::pair i.e.:
std::vector <int, std::pair<int, double>> freqMap;
Eventually I need to look up both keys to find the correct value. Is there a better way to do this, or will this be efficient enough (will have ~3k entries). Also, not sure how to search using the second key (first key in the std::pair). Is there a find for the pair based on the first key? Essentially I can access the first key by:
freqMap[key1]
But not sure how to iterate and find the second key in the pair.
Edit: Ok adding the use case for clarification:
I need to look up a val based on 2 keys, a mux selection and a frequency selection. The raw data looks something like this:
Mux, Freq, Val
0, 1000, 1.1
0, 2000, 2.7
0, 10e9, 1,7
1, 1000, 2.2
1, 2500, 0.8
6, 2000, 2.2
The blanket answer to "which is faster" is generally "you have to benchmark it".
But besides that, you have a number of options. A std::map is more efficient than other data structures on paper, but not necessarily in practice. If you truly are in a situation where this is performance critical (i.e. avoid premature optimisation) try different approaches, as sketched below, and measure the performance you get (memory-wise and cpu-wise).
Instead of using a std::map, consider throwing your data into a struct, give it proper names and store all values in a simple std::vector. If you modify the data only seldom, you can optimise retrieval cost at the expense of additional insertion cost by sorting the vector according to the key you are typically using to find an entry. This will allow you to do binary search, which can be much faster than linear search.
However, linear search can be surprisingly fast on a std::vector because of both cache locality and branch prediction. Both of which you are likely to lose when dealing with a map, unordered_map or (binary searched) sorted vector. So, although O(n) sounds much more scary than, say, O(log n) for map or even O(1) for unordered_map, it can still be faster under the right conditions.
Especially if you discover that you don't have a discernible index member you can use to sort your entries, you will have to either stick to linear search in contiguous memory (i.e. vector) or invest into a doubly indexed data structure (effectively something akin to two maps or two unordered_maps). Having two indexes usually prevents you from using a single map/unordered_map.
If you can pack your data more tightly (i.e. do you need an int or would a std::uint8_t do the job?, do you need a double? etc.) you will amplify cache locality and for only 3k entries you have good chances of an unsorted vector to perform best. Although operations on an std::size_t are typically faster themselves than on smaller types, iterating over contiguous memory usually offsets this effect.
Conclusion: Try an unsorted vector, a sorted vector (+binary search), a map and an unordered_map. Do proper benchmarking (with several repetitions) and pick the fastest one. If it doesn't make a difference pick the one that is the most straight-forward to understand.
Edit: Given your example data, it sounds like the first key has an extremely small domain. As far as I can tell "Mux" seems to be limited to a small number of different values which are near each other, in such a situation you may consider using an std::array as your primary indexing structure and have a suitable lookup structure as your second one. For example:
std::array<std::vector<std::pair<std::uint64_t,double>>,10>
std::array<std::unordered_map<std::uint64_t,double>,10>

Most efficient container for uint16_t to uint16_t map

I am developing program for very processing power limited machine where I want to map uint16_t keys to uint16_t values.
I am currently using std::map use non-safe reading:
std::map<uint16_t, uint16_t> m;
//fill m only once
while(true){
auto x = m[y];
}
The performance is still non-acceptable for the requirements. Is there a better solution in term of execution speed?
Edit:
Some information:
Total number of items is less than 500
Insertion is done only once
Querying values is done more than 250 times / sec
Keys and values are unique
Ram is very limited, overall ram is 512KB, free ram for this part of code is less than 50KB
100 MHz single core processor
Without some more context about your map,
if you intend to use a lot of keys, a big array like suggested before would be easy enough to handle since there wouldn't be collisions, but if you're not going to use all the memory, it might be wasteful.
if you intend to use a fair amount of data, but not enough that there'd be too many hash collisions, std::unordered_map has amortized O(1) lookups, and if you don't care about the order they're stored in, that could be a good guess.
if you're using not a lot of data and require it to be flexible, std::vector is a good choice
Seeing as all we know is that it's a uin16_t to uint16_t map, there's no one best answer.
Total number of items is less than 500
Insertion is done only once
Keep a fixed-size (500) array of key-value pairs plus a "valid items count", if the actual size is known only at runtime; populate it with the data and sort it; then you can simply do a binary search.
Given that the elements are few, depending on the specific processor it may even be more convenient to just do a linear search (perhaps keeping the most used elements first if you happen to have clues about the most frequently looked-up values).
Usually a binary tree (std::map which you currently use) provides adequate performance. Your CPU budget must be really small.
A binary search approach probably won't be much faster. A hashtable (std::unordered_map) might be a solution.
Just make sure to size it properly so as to avoid rehashing and stay just below 1.0 load factor.
std::unordered_map<uint16_t, uint16_t> m(503); // 503 is a prime
m.emplace(123, 234);
m.emplace(2345, 345);
std::cout << m.find(123)->second; // don't use operator[] to avoid creating entries
std::cout << m.find(2345)->second;
To check the quality of the hashing function, iterate over all the buckets by calling bucket_size() and add up anything above 1 (i.e. a collision).

Returning zero when the key is not exist in unordered_map

I have the following container:
std::unordered_map<uint8_t,int> um;
um is assumed to have keys between 0 and 255 but not all of them. So, in certain point of time I want to ask it to give me the value of the key 13 for example. If it was there, I want its value (which is guaranteed to be not 0). If not, I want it to return 0.
What is the best way (performance point of view) to implement this?
What I tried till now: use find and return 0 if it was not found or the value if it was found.
P.S. Changing to std::vector<int> that contains 256 items is not an option. I can not afford the space to storing 256 values always.
EDIT:
My problem is histogram computing problem keys (colors 0-255) values(frequent, int is enough). I will not be satisfied if I just know that some key is exist or not. I also need the value (the frequent).
Additional information:
I will never erase any item.
I will add items sometimes (at most 256 items) and usually less than 10.
I will query on key so so many times.
Usually querying and inserting come with no specific order.
You have a trade-off between memory and speed.
Your unordered_map should have the less speed complexity.
Using std::vector<std::pair<uint8_t, int>> would be more compact (and more cache friendly).
std::pair<std::vector<uint8_t>, std::vector<int>> would be even more compact (no padding between uint8_t and int)
You can even do better by factorizing size/capacity, but it is no longer in std::.
With vector, you have then an other trade of: complexity for searching and add key:
unsorted vector: constant add, Linear search
sorted vector: linear add (due to insert value in middle of vector), logarithmic search.
I might use a vector for space compactness.
It is tempting to keep it sorted for logarithmic search performance. But since the expected number of elements is less than 10, I might just leave it unsorted and use linear search.
So
vector<pair<uint8_t, int>> data;
If the number of expected elements is large, then having a sorted vector might help.
Boost offers a map-like interface with vector-like layout. See boost flat_map at http://www.boost.org/doc/libs/1_48_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx

Efficiently insert integers into a growing array if no duplicates

There is a data structure which acts like a growing array. Unknown amount of integers will be inserted into it one by one, if and only if these integers has no dup in this data structure.
Initially I thought a std::set suffices, it will automatically grow as new integers come in and make sure no dups.
But, as the set grows large, the insertion speed goes down. So any other idea to do this job besides hash?
Ps
I wonder any tricks such as xor all the elements or build a Sparse Table (just like for rmq) would apply?
If you're willing to spend memory on the problem, 2^32 bits is 512MB, at which point you can just use a bit field, one bit per possible integer. Setting aside CPU cache effects, this gives O(1) insertion and lookup times.
Without knowing more about your use case, it's difficult to say whether this is a worthwhile use of memory or a vast memory expense for almost no gain.
This site includes all the possible containers and layout their running time for each action ,
so maybe this will be useful :
http://en.cppreference.com/w/cpp/container
Seems like unordered_set as suggested is your best way.
You could try a std::unordered_set, which should be implemented as a hash table (well, I do not understand why you write "besides hash"; std::set normally is implemented as a balanced tree, which should be the reason for insufficient insertion performance).
If there is some range the numbers fall in, then you can create several std::set as buckets.
EDIT- According to the range that you have specified, std::set, should be fast enough. O(log n) is fast enough for most purposes, unless you have done some measurements and found it slow for your case.
Also you can use Pigeonhole Principle along with sets to reject any possible duplicate, (applicable when set grows large).
A bit vector can be useful to detect duplicates
Even more requirements would be necessary for an optimal decision. This suggestion is based on the following constraints:
Alcott 32 bit integers, with about 10.000.000 elements (ie any 10m out of 2^32)
It is a BST (binary search tree) where every node stores two values, the beginning and the end of a continuous region. The first element stores the number where a region starts, the second the last. This arrangement allows big regions in the hope that you reach you 10M limit with a very small tree height, so cheap search. The data structure with 10m elements would take up 8 bytes per node, plus the links (2x4bytes) maximum two children per node. So that make 80M for all the 10M elements. And of course, if there are usually more elements inserted you can keep track of the once which are not.
Now if you need to be very careful with space and after running simulations and/or statistical checks you find that there are lots of small regions (less than 32 bit in length), you may want to change your node type to one number which starts the region, plus a bitmap.
If you don't have to align access to the bitmap and, say, you only have continuous chunks with only 8 elements, then your memo requirement becuse 4+1 for the node and 4+4 bytes for the children. Hope this helps.

How to map 13 bit value to 4 bit code?

I have a std::map for some packet processing program.
I didn't noticed before profiling but unfortunately this map lookup alone consume about 10% CPU time (called too many time).
Usually there only exist at most 10 keys in the input data. So I'm trying to implement a kind of key cache in front of the map.
Key value is 13 bit integer. I know there are only 8192 possible keys and array of 8192 items can give constant time lookup but I feel already ashamed and don't want use such a naive approach :(
Now, I'm just guessing some method of hashing that yield 4 bit code value for 13 bit integer very fast.
Any cool idea?
Thanks in advance.
UPDATE
Beside my shame, I don't have total control over source code and it's almost prohibited to make new array for this purpose.
Project manager said (who ran the profiler) linked list show small performance gain and recommended using std::list instead of std::map.
UPDATE
Value of keys are random (no relationship) and doesn't have good distribution.
Sample:
1) 0x100, 0x101, 0x10, 0x0, 0xffe
2) 0x400, 0x401, 0x402, 0x403, 0x404, 0x405, 0xff
Assuming your hash table either contains some basic type -- it's almost no memory at all. Even on 64-bit systems it's only 64kb of memory. There is no shame in using a lookup table like that, it has some of the best performance you can get.
You may want to go with middle solution and open addressing technique: one array of size 256. Index to an array is some simple hash function like XOR of two bytes. Element of the array is struct {key, value}. Collisions are handled by storing collided element at the next available index. If you need to delete element from array, and if deletion is rare then just recreate array (create a temporary list from remaining elements, and then create array from this list).
If you pick your hash function smartly there would not be any collisions almost ever. For instance, from your two examples one such hash would be to XOR low nibble of high byte with high nibble of low byte (and do what you like with remaining 13-th bit).
Unless you're writing for some sort of embedded system where 8K is really significant, just use the array and move on. If you really insist on doing something else, you might consider a perfect hash generator (e.g., gperf).
If there are really only going to be something like 10 active entries in your table, you might seriously consider using an unsorted vector to hold this mapping. Something like this:
typedef int key_type;
typedef int value_type;
std::vector<std::pair<key_type, value_type> > mapping;
inline void put(key_type key, value_type value) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
mapping[i].second=value;
return;
}
}
mapping.push_back(std::make_pair(key, value));
}
inline value_type get(key_type key) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
return mapping[i].second;
}
}
// do something reasonable if not found?
return value_type();
}
Now, the asymptotic speed of these algorithms (each O(n)) is much worse than you'd have with either a red-black tree (like std::map at O(log n)) or hash table (O(1)). But you're not talking about dealing with a large number of objects, so asymptotic estimates don't really buy you much.
Additionally, std::vector buys you both low overhead and locality of reference, which neither std::map nor std::list can offer. So it's more likely that a small std::vector will stay entirely within the L1 cache. As it's almost certainly the memory bottleneck that's causing your performance issues, using a std::vector with even my poor choice of algorithm will likely be faster than either a tree or linked list. Of course, only a few solid profiles will tell you for sure.
There are certainly algorithms that might be better choices: a sorted vector could potentially give even better performance; a well tuned small hash table might work as well. I suspect that you'll run into Amdahl's law pretty quickly trying to improve on a simple unsorted vector, however. Pretty soon you might find yourself running into function call overhead, or some other such concern, as a large contributor to your profile.
I agree with GWW, you don't use so much memory in the end...
But if you want, you could use an array of 11 or 13 linkedlists, and hash the keys with the % function. If the key number is less than the array size, complexity tents still to be O(1).
When you always just have about ten keys, use a list (or array). Do some benchmarking to find out whether or not using a sorted list (or array) and binary search will improve performance.
You might first want to see if there are any unnecessary calls to the key lookup. You only want to do this once per packet ideally -- each time you call a function there is going to be some overhead, so getting rid of extra calls is good.
Map is generally pretty fast, but if there is any exploitable pattern in the way keys are mapped to items you could use that and potentially do better. Could you provide a bit more information about the keys and the associated 4-bit values? E.g. are they sequential, is there some sort of pattern?
Finally, as others have mentioned, a lookup table is very fast, 8192 values * 4 bits is only 4kb, a tiny amount of memory indeed.
I would use a lookup table. It's tiny unless you are using a micrcontroller or something.
Otherwise I would do this -
Generate a table of say 30 elements.
For each lookup calculate a hash value of (key % 30) and compare it with the stored key in that location in the table. If the key is there then you found your value. if the slot is empty, then add it. If the key is wrong then skip to the next free cell and repeat.
With 30 cells and 10 keys collisions should be rare but if you get one it's fast to skip to the next cell, and normal lookups are simply a modulus and a compare operation so fairly fast