I have a list L (in the general sense, not std::list) of numbers and I also have i which is the index of the smallest element in L. I want to swap the two partitions separated by index i. What standard data structure and what operations should I perform such that I can do this as efficiently as possible (preferably in constant time)?
An example: let L be 9 6 -4 6 12. The smallest value is L[2] = -4, so i = 2. After swapping the two partitions, I want L to be -4 6 12 9 6.
The list will be pretty large (up to 103 elements) and I will also have to traverse it multiple times (up to 103 traversions in the worst case), so using std::list is not a good idea due to caching issues. On the other hand, std::vector will make it difficult to swap the two partitions. Is std::deque a good choice for this?
There are two aspects to your problem:
1- Constant time swap: Conceptually speaking, the best approach will be a doubly linked list (std::list) in terms of swapping.
Since your data is big, nodes will always remain at their initial places in the memory, and you will only alter some constant number of pointers to do the type of swap your are mentioning.
2- Locality: We all know that a contiguously allocated space in memory is better for cache performance. This leans towards std::vector.
What is in the middle?
Resizable contiguous chunks of memory that can be allocated through a custom allocator. There are numerous ways to design these. An example.
Related
This is more of an intellectual exercise, but are there battle-tested C++ libraries implementing hash map/set (std::unordered_map, std::unordered_set), red-back trees (std::map, std::set) using std::vectors?
Take std::unordered_set for instance. My idea is to use 3 vectors, X (stores heads of chains), Y (stores set elements), Z (temporary container).
Roughly speaking, given a set element 12345,
Let i = X[hash(12345) % X.size()]. Then Y[i] is the head of the chain that 12345 lives on.
Y[i] is a pair (val, j). val is some element value. Y[j] is the next item on chain.
Once 12345 is found, deleting it can leave a "hole" in Y.
A new element will be pushed back to Y and X will be adjusted accordingly.
If the number of "holes" in Y exceeds, e.g. 50% of Y.size(), adjust X and Y globally to erase all the "holes", during which Z might be needed as a temporary storage.
The idea applies to trees in std::set and std::map. Of course many other details need to be carefully taken care of.
Has anybody tried something like this? The motivation is to keep the data structure as compact as possible, and to avoid memory allocations as much as possible. I imagine this will yield some good speedup for small and medium size applications -- or maybe I am wrong?
Thanks!
Yes, there are. Google dense_hash_map is one of such example.
There is an immense variety of hash maps and tables built with purpose-specific requirements like cache locality, size, read speed, write speed. As speed is highly dependent on cache locality, it is very common for these implementations to use vectors as backend storage.
Have a look at this shootout between hashmaps and browse through every one of them.
I have billions of lables. These lables contain about at most 20 integers ranging from 1 to 500. I need to search for the existence of each integer in each lable and maybe insert the integer to the lable. I have memory limit too. So I need to delete the lables in some cases to free up memory.
which one is better?
using vector for saving the data of lables or using unordered_set?
as already hinted in one of the comments:
std::bitset will take less space than the 20 integers and gives O(1) add/check. this is a good idea, if you have on average more than 15 values per label or can live with some extra memory usage.
if not, i'd recommend vector over set.
it is aligned in memory (less cache misses => faster)
it has smaller memory footprint
if you have bulk insert, you can reserve()
if your vector is sorted, you can use std::binary_search to have O(log n) lookup
As a rule of thumb: if you have less than 50 elements, vector is your container of choice.
As far as i got you, the critical operation is to find all labels, that contain a certain vaue?
Did you consider flipping the structure? Instead of storing ints in
each label, why not have a list of references to labels for each of
your 500 values?
Did you consider a (no-sql) DB to get rid of the memory constraints?
I have a vector<unsigned> of size (90,000 * 9,000). I need to find many times whether an element exists in this vector or not?
For doing so, I stored the vector in a sorted form using std::sort() and then looked up elements in the vector using std::binary_search(). However on profiling using perf I find that looking up elements in vector<unsigned> is the slowest operation.
Can someone suggest some data-structure in C/C++ which I can use to efficiently look up elements in a vector of (90,000 * 9,000) elements.
I perform insertion (bulk-insertion) only once. The rest of the times I perform only lookups, so the main overhead here is because of lookups.
You've got 810 million values out of 4 billion possible values (assuming 32 bits unsigned). That's 1/5th of the total range, and uses 3.2 GB. This means you're in fact better of with a std::vector<bool> with 4 billion bits. This gives you O(1) lookup in less space (0.5 GB).
(In theory, unsigned could be 16 bits. unsigned long is at least 32 bits, std::uint32_t might be what you want)
Depending on the actual data structure of the vector the contains operation may take an O(n) or O(1). Usually, it's O(N) if vector is backed by either associative array or linked list, in this case contains will be a full scan in the worst case scenario. You have mitigated a full scan by ordering and using binary search, which is O(log (N)). Log N is pretty good complexity with only O(1) being better. So your choice is either:
Cache look up result for the items, this might be a good compromise if you have many repetitions of the same element
Replace vector with another data structure with efficient contains operation such as the one based on a hashtable or set. Note you may loose other features, such as ordering of items
Use two data structures, one for contains operations and original vector for whatever you use it for
Use a third data structure that offers a compromise, for example a data structure that work well with bloom filter
However on profiling using perf I find that looking up elements in
vector is the slowest operation.
That is half of the information you need, the other half being "how fast is it compared to other algorithms/containers"? Maybe using std::vector<> is actually the fastest, or maybe its the slowest. To find you'll have to benchmark/profile a few different designs.
For example, the following are very naive benchmarks using random integers on 1000x9000 sized containers (I would get seg-faults on larger sizes for the maps, assumably a limit of 32-bit memory).
If you need a count of non-unique integers:
std::vector<unsigned> = 500 ms
std::map<unsigned, unsigned> = 1700 ms
std::unordered_map<unsigned, unsigned> = 3700 ms
If you just need to test for the presence of unique integers:
std::vector<bool> = 15 ms
std::bitset<> = 50 ms
std::set<unsigned> = 350 ms
Note that we're not too interested in the exact values but rather the relative comparisons between containers. std::map<> is relatively slow which is not surprising given the number of dynamic allocations and non-locality of the data involved. The bitsets are by far the fastest but don't work if need the counts of non-unique integers.
I would suggest doing a similar benchmark using your exact container sizes and contents, both of which may well affect the benchmark results. It may turn out that std::vector<> may be the best solution after all but now you have some data to back up that design choice.
If you do not need iterate through the collection (in a sorted manner) since c++11 you could use std::unordered_set<yourtype> all you need to do is to provide the collection way of getting hashing and equality information for yourtype. The time of accessing element of the collection is here amortised O(1), unlike sorted vector where it's O(log(n)).
There is a data structure which acts like a growing array. Unknown amount of integers will be inserted into it one by one, if and only if these integers has no dup in this data structure.
Initially I thought a std::set suffices, it will automatically grow as new integers come in and make sure no dups.
But, as the set grows large, the insertion speed goes down. So any other idea to do this job besides hash?
Ps
I wonder any tricks such as xor all the elements or build a Sparse Table (just like for rmq) would apply?
If you're willing to spend memory on the problem, 2^32 bits is 512MB, at which point you can just use a bit field, one bit per possible integer. Setting aside CPU cache effects, this gives O(1) insertion and lookup times.
Without knowing more about your use case, it's difficult to say whether this is a worthwhile use of memory or a vast memory expense for almost no gain.
This site includes all the possible containers and layout their running time for each action ,
so maybe this will be useful :
http://en.cppreference.com/w/cpp/container
Seems like unordered_set as suggested is your best way.
You could try a std::unordered_set, which should be implemented as a hash table (well, I do not understand why you write "besides hash"; std::set normally is implemented as a balanced tree, which should be the reason for insufficient insertion performance).
If there is some range the numbers fall in, then you can create several std::set as buckets.
EDIT- According to the range that you have specified, std::set, should be fast enough. O(log n) is fast enough for most purposes, unless you have done some measurements and found it slow for your case.
Also you can use Pigeonhole Principle along with sets to reject any possible duplicate, (applicable when set grows large).
A bit vector can be useful to detect duplicates
Even more requirements would be necessary for an optimal decision. This suggestion is based on the following constraints:
Alcott 32 bit integers, with about 10.000.000 elements (ie any 10m out of 2^32)
It is a BST (binary search tree) where every node stores two values, the beginning and the end of a continuous region. The first element stores the number where a region starts, the second the last. This arrangement allows big regions in the hope that you reach you 10M limit with a very small tree height, so cheap search. The data structure with 10m elements would take up 8 bytes per node, plus the links (2x4bytes) maximum two children per node. So that make 80M for all the 10M elements. And of course, if there are usually more elements inserted you can keep track of the once which are not.
Now if you need to be very careful with space and after running simulations and/or statistical checks you find that there are lots of small regions (less than 32 bit in length), you may want to change your node type to one number which starts the region, plus a bitmap.
If you don't have to align access to the bitmap and, say, you only have continuous chunks with only 8 elements, then your memo requirement becuse 4+1 for the node and 4+4 bytes for the children. Hope this helps.
I'm just reading an article about Boost.Flyweight Performance
As you can see in the link the overhead of the factory is
- for hashed_factory: ~2.5 * sizeof(word)
- for set_factory: 4 * sizeof(word)
The basic question is .... why 4 words for set and not zero ?
As far as I know, using a hash implies computing and storing a hash key, while using a set not: it's implemented as a red-black-tree, inserting and look-up takes log(n), so no values is stored and memory overhead should be zero (with the drawback that instead of one comparison in the case of hash you will have log(n) comparisons). Where is the mistake ?
Each node of the RB tree contains a pointer to the left child, pointer to the right child, the color and one piece of data. The first three count as overhead, which means it isn't 0. I'm not quite sure why they say it's 4 when the 3 elements fit easily in 3 words, but maybe they count in something else (like the parent node pointer, which isn't strictly necessary, or memory allocation overhead, although that's unlikely).