A min-heap with better than O(logn) increase key? - heap

I'm using a priority queue that initially bases the priority of its elements on a heuristic. As elements are dequeued the heuristic is updated and elements currently in the queue may have their keys increased.
I know there are heaps (Fibonacci heaps specifically) that have amortized O(1) decrease key operations, but are there any heap structures that have similar bounds on the increase key operations?
For my application this is far from a performance issue (a binary heap works fine) it's really just about academic curiosity.
Edit: to clarify, I'm looking for a data structure that has a faster than O(logn) time for the increase key operation, not decrease key. My application never decreases the key as the heuristic over-estimates the priority.

Binary heaps are too unflexible to beat logarithmic complexity.
Binomial heaps just allow a more efficient join-operation.
Other heaps with good decrease-key performance are pairing heaps and 2-3 heaps

Binomial heaps take o(log n) time for decrease key operations! Isn't this slower than fibonacci heaps?

Actually, with Fibonacci heaps, the increase key operation is the same as decrease key. IMHO, it is only a tradition to name the operation "decrease key", because it is used in some algorithms. But the Fibonacci heap implementation allows both.

Related

Why O(1) strict LRU implementation is not used in production software(s) and hardware(s)?

Explaining more:
While reading about LRU or Least Recently Used Cache implementation, I came across O(1) solution which uses unordered_map (c++) and a doubly linked list. Which is very efficient as accessing a element from this map is essentially O(1) (due to very good internal hashing) and then moving or deleting from this doubly linked list is also O(1).
Well Explained here
But then, on more research I came accross How is an LRU cache implemented in a CPU?
"For larger associativity, the number of states increases
dramatically: factorial of the number of ways. So a 4-way cache would
have 24 states, requiring 5 bits per set and an 8-way cache would have
40,320 states, requiring 16 bits per set. In addition to the storage
overhead, there is also greater overhead in updating the value"
and this
Now, I am having hard time understanding why the O(1) solution won't fix most of the problems of saving states per set and "age bits" for tracking LRU?
My guess would be, using the hashmap there is no way of saving the associativity but then each entry can be a standalone entry and as the access is O(1), it should not matter, right? The only thing I can think of right now is about the size of this hashmap, but still can't figure out why?
Any help appreciated, Thanks!

Deciding when to use a hash table

I was soving a competitive programming problem with the following requirements:
I had to maintain a list of unqiue 2d points (x,y), the number of unique points would be less than 500.
My idea was to store them in a hash table (C++ unordered set to be specific) and each time a node turned up i would lookup the table and if the node is not already there i would insert it.
I also know for a fact that i wouldn't be doing more than 500 lookups.
So i saw some solutions simply searching through an array (unsorted) and checking if the node was already there before inserting.
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
I am guessing you are familiar with basic algorithmics & time complexity and C++ standard containers and know that with luck hash table access is O(1)
If the hash table code (or some balanced tree code, e.g. using std::map - assuming there is an easy order on keys) is more readable, I would prefer it for that readability reason alone.
Otherwise, you might make some guess taking into account the approximate timing for various operations on a PC. BTW, the entire http:///norvig.com/21-days.html page is worth reading.
Basically, memory accesses are much more slow than everything else in the CPU. The CPU cache is extremely important. A typical memory access with cache fault requiring fetching data from DRAM modules is several hundreds times slower than some elementary arithmetic operation or machine instruction (e.g. adding two integers in registers).
In practice, it does not matter that much, as long as your data is tiny (e.g. less than a thousand elements), since in that case it is likely to sit in L2 cache.
Searching (linearly) in an array is really fast (since very cache friendly), up to several thousand of (small) elements.
IIRC, Herb Sutter mentions in some video that even inserting an element inside a vector is practically -but unintuitively- faster (taking into account the time needed to move slices) than inserting it into some balanced tree (or perhaps some other container, e.g. an hash table), up to a container size of several thousand small elements. This is on typical tablet, desktop or server microprocessor with a multimegabyte cache. YMMV.
If you really care that much, you cannot avoid benchmarking.
Notice that 500 pairs of integers is probably fitting into the L1 cache!
My rule of thumb is to assume the processor can deal with 10^9 operations per second.
In your case there are only 500 entries. An algorithm up to O(N^2) could be safe. By using contiguous data structure like vector you can leverage the fast cache hit. Also hash function sometimes can be costly in terms of constant. However if you have a data size of 10^6, the safe complexity might be only O(N) in total. In this case you might need to consider O(1) hashmap for a single lookup.
You can use Big O Complexity to roughly estimate the performance. For the Hash Table, Searching an element is between O(1) and O(n) in the worst case. That means, that in the best case your access time is independant of the number of elements in your map but in the worst case it is linear dependant on the size of your hash table.
A Binary tree has a guaranteed search complexity of O(nlog(n)). That means, that searching an element always depends on the size of the array, but in the Worst Case its faster than a hash table.
You can look up some Big O Complexities at this handy website here: http://bigocheatsheet.com/

Map vs Unordered_map-- Multithreading

I have the following requirements:
I need a datastructure with key,value pairs(keys are integers if that helps).
I need the below operations:-
Iteration(most used)
Insertion (2nd most used)
Searching by key and deletion(least)
I plan to use multiple locks over the structure for concurrent access.
What is the ideal data structure to use?
Map or an unordered map?
I think unordered map makes sense, because i can insert in O(1), delete in O(1). But i am not sure of the iteration. How bad is the performance when compared to map?
Also, i plan to use multiple locks on blocks instead of the whole structure. Any good implementation example of this?
Thanks
The speed of iterator incrementing is O(1) for both containers, although you might get somewhat better cache locality from std::unordered_map.
Apart from the slower O(log N) find/insert/erase functionality of std::map, one other difference is that std::map provides bidirectional iterators, whereas the faster (amortized O(1) element access) std::unordered_map only provides forward iterators.
The excellent book C++ Concurrency in Action: Practical Multithreading by Anthony Williams provides a code example of a multithreaded unordered_map with a lock per entry. This book is highly recommended if you are doing serious multithreaded coding.
Iteration is not a problem in an unordered_map. It is a little less efficient than a vector, but not largely so.
As always, you will need to benchmark for YOUR use-cases, and compare with other container types if it's a critical part of your code.
Not sure what you mean by "multiple locks on blocks instead of the whole structure" - any container updates will need to be locked for the whole container...
Why not simply use an existing concurrent_unordered_map which you can find in both TBB and Concrt.
Have you thought about trying a std::deque The reasoning being as follows:
Iteration is fast - data should be more or less packed close (unlike lists)
Insertion (at either end) should be quick - data in a deque is never resized
Iteration and deletion slow (but uncommon usecase).
If the last two cases are common, a std::list may be used. Also consider testing a std::vector` since it is more cache efficient.
Iteration in an unordered_map may be slow due to iterating over a large number of unused elements in the hashtable. Insertions will be quich until collision levels become intolerable at which point the whole data structure will need to laid out again.
maps have relatively fast iteration except that data elements may be far apart. Insertion can be slow due to the re-balancing of the red-black trees that this requires.
The main usecase for unordered_maps is for fast lookup (O1). normal map have fastish lookup (O log n) but much better iteration performance.
If you have hard real-time requirements, I would recommend map over unordered_map. std::map has guaranteed performance 100% of the time, but the std::unordered_map may do a rehash and completely ruin real-time performance in some critical corner case. In general, I prefer red-black trees (std::map) over hashtables (std::unordered_map) if I need absolute guarantees on worst-case performance.

What are the criteria for choosing a sorting algorithm?

I was reading sorting method which include bubble sort, selection sort, merge sort, heap sort, bucket sort etc.. They also contain time complexity which help us to know which sorting is efficient. So I had a basic question. If we contain data than how will we be choose sorting. Time complexity is one of parameter which help us to decide sorting method. But do we have another parameter to choose sorting method?.
Just trying to figure out sorting for better understanding.
Having some query about heap sort:
Where do we use heap sort?
What is bigger advantage of heap sort (except time complexity O(n log n))?
What is disadvantage of heap sort?
What is build time for heap? (I heard O(n) but I'm not sure.)
Any scenario where we have to use heap sort or heap sort is better option (except priority queue)?
Before applying the heap sort on data, what are the parameter will we look into data?
The two main theoretical features of sorting algorithms are time complexity and space complexity.
In general, time complexity lets us know how the performance of the algorithm changes as the size of the data set increases. Things to consider:
How much data are you expecting to sort? This will help you know whether you need to look for an algorithm with a very low time complexity.
How sorted will your data be already? Will it be partly sorted? Randomly sorted? This can affect the time complexity of the sorting algorithm. Most algorithms will have worst and best cases - you want to make sure you're not using an algorithm on a worst-case data set.
Time complexity is not the same as running time. Remember that time complexity only describes how the performance of an algorithm varies as the size of the data set increases. An algorithm that always does one pass over all the input would be O(n) - it's performance is linearly correlated with the size of the input. But, an algorithm that always does two passes over the data set is also O(n) - the correlation is still linear, even if the constant (and actual running time) is different.
Similarly, space complexity describes how much space an algorithm needs to run. For example, a simple sort such as insertion sort needs an additional fixed amount of space to store the value of the element currently being inserted. This is an auxiliary space complexity of O(1) - it doesn't change with the size of the input. However, merge sort creates extra arrays in memory while it runs, with an auxiliary space complexity of O(n). This means the amount of extra space it requires is linearly correlated with the size of the input.
Of course, algorithm design is often a trade-off between time and space - algorithms with a low space complexity may require more time, and algoithms with a low time complexity may require more space.
For more information, you may find this tutorial useful.
To answer your updated question, you may find the wikipedia page on Heap Sort useful.
If you mean criteria for what type of sort to choose, here are some other items to consider.
The amount of data you have: To you have ten, one hundred, a thousand or millions of items to be sorted.
Complexity of the algorithm: The more complex the more testing will need to be done to make sure it is correct. For small amounts, a bubble sort or quick sort is easy to code and test, verse other sorts which may be overkill for the amount of data you have to sort.
How much time will it take to sort: If you have a large set, bubble/quick sort will take a lot of time, but if you have a lot of time, that may not be an issue. However, using a more complex algorithm will cut down the time to sort, but at the cost of more effort in coding and testing, which may be worth it if sorting goes from long (hours/days) to a shorter amount of time.
The data itself: Is the data close to being the same for everything. For some sorts you may end up with a linear list, so if you know something about the composition of the data, it may help in determining which algorithm to choose for the effort.
The amount of resources available: Do you have lots of memory in which you store all items, or do you need to store items to disk. If everything cannot fit in memory, merge sort may be best, where other may be better if you work with everything in memory.

Memory Allocation in std::map

I am doing a report on the various C++ dictionary implementations (map, dictionary, vectors etc).
The results for insertions using a std::map illustrate that that the performance is O(log n). There are also consistent spikes in the performance. I am not 100% sure what's causing this; I think they are caused by memory allocation but I have been unsuccessful in finding any literature / documentation to prove this.
Can anyone clear this matter up or point me in the right direction?
Cheers.
You are right: it is O(log n) complexity. But this is due to the sorted nature of map (normally binary tree based).
Also see http://www.sgi.com/tech/stl/UniqueSortedAssociativeContainer.html there is a note on insert. It’s worst case is O(log n) and amortized O(1) if you can hint where to do the insert.
Maps are normally based on binary trees and need to be balanced to keep good performance. The load spikes you are observing probably correspond to this balancing process
The empirical approach isn't strictly necessary when it comes to STL. There's no point in experimenting when the standard clearly dictates the minimal complexity of operations such as std::map insertion.
I urge you to read the standard so you're aware of the minimal complexity guarantees before continuing with experiments. Of course, there might be bugs in whatever STL implementation you happen to be testing; but the popular STLs are pretty well-debugged creatures and very widely used, so I'd doubt it.
If I remember correctly, std::map is a balanced red-black tree. Some of the spikes could be caused when the std::map determines that the underlying tree needs balancing. Also, when a new node is allocated, the OS could contribute to some spikes during the allocation portion.