Binary search of range in std::map slower than map::find() search of whole map - c++

Background: I'm new to C++. I have a std::map and am trying to search for elements by key.
Problem: Performance. The map::find() function slows down when the map gets big.
Preferred approach: I often know roughly where in the map the element should be; I can provide a [first,last) range to search in. This range is always small w.r.t. the number of elements in the map. I'm interested in writing a short binary search utility function with boundary hinting.
Attempt: I stole the below function from https://en.cppreference.com/w/cpp/algorithm/lower_bound and did some rough benchmarks. This function seems to be much slower than map::find() for maps large and small, regardless of the size or position of the range hint provided. I replaced the comparison statements (it->first < value) with a comparison of random ints and the slowdown appeared to resolve, so I think the slowdown may be caused by the dereferencing of it->first.
Question: Is the dereferencing the issue? Or is there some kind of unnecessary copy/move action going on? I think I remember reading that maps don't store their element nodes sequentially in memory, so am I just getting a bunch of cache misses? What is the likely cause of the slowdown, and how would I go about fixing it?
/* #param first Iterator pointing to the first element of the map to search.
* #param distance Number of map elements in the range to search.
* #param key Map key to search for. NOTE: Type validation is not a concern just yet.
*/
template<class ForwardIt, class T>
ForwardIt binary_search_map (ForwardIt& first, const int distance, const T& key) {
ForwardIt it = first;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = distance;
while (count > 0) {
it = first;
step = count/2;
std::advance(it, step);
if (it->first < value) {
first = ++it;
count -= step + 1;
}
else if (it->first > value)
count = step;
else {
first = it;
break;
}
}
return first;
}

There is a reason that std::map::find() exists. The implementation already does a binary search, as the std::map has a balanced binary tree as implementation.
Your implementation of binary search is much slower because you can't take advantage of that binary tree.
If you want to take the middle of the map, you start with std::advance it takes the first node (which is at the leaf of the tree) and navigates through several pointers towards what you consider to be the middle. Afterwards, you again need to go from one of these leaf nodes to the next. Again following a lot of pointers.
The result: next to a lot more looping, you get a lot of cache misses, especially when the map is large.
If you want to improve the lookups in your map, I would recommend using a different structure. When ordering ain't important, you could use std::unordered_map. When order is important, you could use a sorted std::vector<std::pair<Key, Value>>. In case you have boost available, this already exists in a class called boost::container::flat_map.

Related

Which container is most efficient for multiple insertions / deletions in C++?

I was set a homework challenge as part of an application process (I was rejected, by the way; I wouldn't be writing this otherwise) in which I was to implement the following functions:
// Store a collection of integers
class IntegerCollection {
public:
// Insert one entry with value x
void Insert(int x);
// Erase one entry with value x, if one exists
void Erase(int x);
// Erase all entries, x, from <= x < to
void Erase(int from, int to);
// Return the count of all entries, x, from <= x < to
size_t Count(int from, int to) const;
The functions were then put through a bunch of tests, most of which were trivial. The final test was the real challenge as it performed 500,000 single insertions, 500,000 calls to count and 500,000 single deletions.
The member variables of IntegerCollection were not specified and so I had to choose how to store the integers. Naturally, an STL container seemed like a good idea and keeping it sorted seemed an easy way to keep things efficient.
Here is my code for the four functions using a vector:
// Previous bit of code shown goes here
private:
std::vector<int> integerCollection;
};
void IntegerCollection::Insert(int x) {
/* using lower_bound to find the right place for x to be inserted
keeps the vector sorted and makes life much easier */
auto it = std::lower_bound(integerCollection.begin(), integerCollection.end(), x);
integerCollection.insert(it, x);
}
void IntegerCollection::Erase(int x) {
// find the location of the first element containing x and delete if it exists
auto it = std::find(integerCollection.begin(), integerCollection.end(), x);
if (it != integerCollection.end()) {
integerCollection.erase(it);
}
}
void IntegerCollection::Erase(int from, int to) {
if (integerCollection.empty()) return;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
/* std::vector::erase deletes entries between the two pointers
fromBound (included) and toBound (not indcluded) */
integerCollection.erase(fromBound, toBound);
}
size_t IntegerCollection::Count(int from, int to) const {
if (integerCollection.empty()) return 0;
int count = 0;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
// increment pointer until fromBound == toBound (we don't count elements of value = to)
while (fromBound != toBound) {
++count; ++fromBound;
}
return count;
}
The company got back to me saying that they wouldn't be moving forward because my choice of container meant the runtime complexity was too high. I also tried using list and deque and compared the runtime. As I expected, I found that list was dreadful and that vector took the edge over deque. So as far as I was concerned I had made the best of a bad situation, but apparently not!
I would like to know what the correct container to use in this situation is? deque only makes sense if I can guarantee insertion or deletion to the ends of the container and list hogs memory. Is there something else that I'm completely overlooking?
We cannot know what would make the company happy. If they reject std::vector without concise reasoning I wouldn't want to work for them anyway. Moreover, we dont really know the precise requirements. Were you asked to provide one reasonably well performing implementation? Did they expect you to squeeze out the last percent of the provided benchmark by profiling a bunch of different implementations?
The latter is probably too much for a homework challenge as part of an application process. If it is the first you can either
roll your own. It is unlikely that the interface you were given can be implemented more efficiently than one of the std containers does... unless your requirements are so specific that you can write something that performs well under that specific benchmark.
std::vector for data locality. See eg here for Bjarne himself advocating std::vector rather than linked lists.
std::set for ease of implementation. It seems like you want the container sorted and the interface you have to implement fits that of std::set quite well.
Let's compare only isertion and erasure assuming the container needs to stay sorted:
operation std::set std::vector
insert log(N) N
erase log(N) N
Note that the log(N) for the binary_search to find the position to insert/erase in the vector can be neglected compared to the N.
Now you have to consider that the asymptotic complexity listed above completely neglects the non-linearity of memory access. In reality data can be far away in memory (std::set) leading to many cache misses or it can be local as with std::vector. The log(N) only wins for huge N. To get an idea of the difference 500000/log(500000) is roughly 26410 while 1000/log(1000) is only ~100.
I would expect std::vector to outperform std::set for considerably small container sizes, but at some point the log(N) wins over cache. The exact location of this turning point depends on many factors and can only reliably determined by profiling and measuring.
Nobody knows which container is MOST efficient for multiple insertions / deletions. That is like asking what is the most fuel-efficient design for a car engine possible. People are always innovating on the car engines. They make more efficient ones all the time. However, I would recommend a splay tree. The time required for a insertion or deletion is a splay tree is not constant. Some insertions take a long time and some take only a very a short time. However, the average time per insertion/deletion is always guaranteed to be be O(log n), where n is the number of items being stored in the splay tree. logarithmic time is extremely efficient. It should be good enough for your purposes.
The first thing that comes to mind is to hash the integer value so single look ups can be done in constant time.
The integer value can be hashed to compute an index in to an array of bools or bits, used to tell if the integer value is in the container or not.
Counting and and deleting large ranges could be sped up from there, by using multiple hash tables for specific integer ranges.
If you had 0x10000 hash tables, that each stored ints from 0 to 0xFFFF and were using 32 bit integers you could then mask and shift the upper half of the int value and use that as an index to find the correct hash table to insert / delete values from.
IntHashTable containers[0x10000];
u_int32 hashIndex = (u_int32)value / 0x10000;
u_int32int valueInTable = (u_int32)value - (hashIndex * 0x10000);
containers[hashIndex].insert(valueInTable);
Count for example could be implemented as so, if each hash table kept count of the number of elements it contained:
indexStart = startRange / 0x10000;
indexEnd = endRange / 0x10000;
int countTotal = 0;
for (int i = indexStart; i<=indexEnd; ++i) {
countTotal += containers[i].count();
}
Not sure if using sorting really is a requirement for removing the range. It might be based on position. Anyway, here is a link with some hints which STL container to use.
In which scenario do I use a particular STL container?
Just FYI.
Vector maybe a good choice, but it does a lot of re allocation, as you know. I prefer deque instead, as it doesn't require big chunk of memory to allocate all items. For such requirement as you had, list probably fit better.
Basic solution for this problem might be std::map<int, int>
where key is the integer you are storing and value is the number of occurences.
Problem with this is that you can not quickly remove/count ranges. In other words complexity is linear.
For quick count you would need to implement your own complete binary tree where you can know the number of nodes between 2 nodes(upper and lower bound node) because you know the size of tree, and you know how many left and right turns you took to upper and lower bound nodes. Note that we are talking about complete binary tree, in general binary tree you can not make this calculation fast.
For quick range remove I do not know how to make it faster than linear.

Efficient intersection of two sets

I have two sets (or maps) and need to efficiently handle their intersection.
I know that there are two ways of doing this:
iterate over both maps as in std::set_intersection: O(n1+n2)
iterating over one map and finding elements in the other: O(n1*log(n2))
Depending on the sizes either of these two solution is significantly better (have timed it), and I thus need to either switch between these algorithm based on the sizes (which is a bit messy) - or find a solution outperforming both, e.g. using some variant of map.find() taking the previous iterator as a hint (similarly as map.emplace_hint(...)) - but I could not find such a function.
Question: Is it possible to combine the performance characteristics of the two solutions directly using STL - or some compatible library?
Note that the performance requirement makes this different from earlier questions such as
Efficient intersection of sets?
In almost every case std::set_intersection will be the best choice.
The other solution may be better only if the sets contain a very small number of elements.
Due to the nature of the log with base two.
Which scales as:
n = 2, log(n)= 1
n = 4, log(n)= 2
n = 8, log(n)= 3
.....
n = 1024 log(n) = 10
O(n1*log(n2) is significantly more complex than O(n1 + n2) if the length of the sets is more than 5-10 elements.
There is a reason such function is added to the STL and it is implemented like that. It will also make the code more readable.
Selection sort is faster than merge or quick sort for collections with length less than 20 but is rarely used.
For sets that are implemented as binary trees, there actually is an algorithm that combines the benefits of both the procedures you mention. Essentially, you do a merge like std::set_intersection, but while iterating in one tree, you skip any branches that are all less than the current value in the other.
The resulting intersection takes O(min(n1 log n2, n2 log n1, n1 + n2), which is just what you want.
Unfortunately, I'm pretty sure std::set doesn't provide interfaces that could support this operation.
I've done it a few times in the past though, when working on joining inverted indexes and similar things. Usually I make iterators with a skipTo(x) operation that will advance to the next element >= x. To meet my promised complexity it has to be able to skip N elements in log(N) amortized time. Then an intersection looks like this:
void get_intersection(vector<T> *dest, const set<T> set1, const set<T> set2)
{
auto end1 = set1.end();
auto end2 = set2.end();
auto it1 = set1.begin();
if (it1 == end1)
return;
auto it2 = set2.begin();
if (it2 == end2)
return;
for (;;)
{
it1.skipTo(*it2);
if (it1 == end1)
break;
if (*it1 == *it2)
{
dest->push_back(*it1);
++it1;
}
it2.skipTo(*it1);
if (it2 == end2)
break;
if (*it2 == *it1)
{
dest->push_back(*it2);
++it2;
}
}
}
It easily extends to an arbitrary number of sets using a vector of iterators, and pretty much any ordered collection can be extended to provide the iterators required -- sorted arrays, binary trees, b-trees, skip lists, etc.
I don't know how to do this using the standard library, but if you wrote your own balanced binary search tree, here is how to implement a limited "find with hint". (Depending on your other requirements, a BST reimplementation could also leave out the parent pointers, which could be a performance win over the STL.)
Assume that the hint value is less than the value to be found and that we know the stack of ancestors of the hint node to whose left sub-tree the hint node belongs. First search normally in the right sub-tree of the hint node, pushing nodes onto the stack as warranted (to prepare the hint for next time). If this doesn't work, then while the stack's top node has a value that is less than the query value, pop the stack. Search from the last node popped (if any), pushing as warranted.
I claim that, when using this mechanism to search successively for values in ascending order, (1) each tree edge is traversed at most once, and (2) each find traverses the edges of at most two descending paths. Given 2*n1 descending paths in a binary tree with n2 nodes, the cost of the edges is O(n1 log n2). It's also O(n2), because each edge is traversed once.
With regard to the performance requirement, O(n1 + n2) is in most circumstances a very good complexity so only worth considering if you're doing this calc in a tight loop.
If you really do need it, the combination approach isn't too bad, perhaps something like?
Pseudocode:
x' = set_with_min_length([x, y])
y' = set_with_max_length([x, y])
if (x'.length * log(y'.length)) <= (x'.length + y'.length):
return iterate_over_map_find_elements_in_other(y', x')
return std::set_intersection(x, y)
I don't think you'll find an algorithm that will beat either of these complexities but happy to be proven wrong.

Sorted data structure for in-order iteration, ordered push, and removal (N elements only from top)

What is considered an optimal data structure for pushing something in order (so inserts at any position, able to find correct position), in-order iteration, and popping N elements off the top (so the N smallest elements, N determined by comparisons with threshold value)? The push and pop need to be particularly fast (run every iteration of a loop), while the in-order full iteration of the data happens at a variable rate but likely an order of magnitude less often. The data can't be purged by the full iteration, it needs to be unchanged. Everything that is pushed will eventually be popped, but since a pop can remove multiple elements there can be more pushes than pops. The scale of data in the structure at any one time could go up to hundreds or low thousands of elements.
I'm currently using a std::deque and binary search to insert elements in ascending order. Profiling shows it taking up the majority of the time, so something has got to change. std::priority_queue doesn't allow iteration, and hacks I've seen to do it won't iterate in order. Even on a limited test (no full iteration!), the std::set class performed worse than my std::deque approach.
None of the classes I'm messing with seem to be built with this use case in mind. I'm not averse to making my own class, if there's a data structure not to be found in STL or boost for some reason.
edit:
There's two major functions right now, push and prune. push uses 65% of the time, prune uses 32%. Most of the time used in push is due to insertion into the deque (64% out of 65%). Only 1% comes from the binary search to find the position.
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::push(const Data& data) //65% of processing
{
size_t index = find(data.values[(axis * 2) + 1]);
this->data.insert(this->data.begin() + index, data); //64% of all processing happens here
}
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::prune(T value) //32% of processing
{
auto top = data.begin(), end = data.end(), it = top;
for (; it != end; ++it)
{
Data& data = *it;
if (data.values[(axis * 2) + 1] > value) break;
}
data.erase(top, it);
}
template<typename T, size_t Axes>
size_t Splitter<T, Axes>::SortedData::find(T value)
{
size_t start = 0;
size_t end = this->data.size();
if (!end) return 0;
size_t diff;
while (diff = (end - start) >> 1)
{
size_t mid = diff + start;
if (this->data[mid].values[(axis * 2) + 1] <= value)
{
start = mid;
}
else
{
end = mid;
}
}
return this->data[start].values[(axis * 2) + 1] <= value ? end : start;
}
With your requirements, a hybrid data-structure tailored to your needs will probably perform best. As others have said, continuous memory is very important, but I would not recommend keeping the array sorted at all times. I propose you use 3 buffers (1 std::array and 2 std::vectors):
1 (constant-size) Buffer for the "insertion heap". Needs to fit into the cache.
2 (variable-sized) Buffers (A+B) to maintain and update sorted arrays.
When you push an element, you add it to the insertion heap via std::push_heap. Since the insertion heap is constant size, it can overflow. When that happens, you std::sort it backwards and std::merge it with the already sorted-sequence buffer (A) into the third (B), resizing them as needed. That will be the new sorted buffer and the old one can be discarded, i.e. you swap A and B for the next bulk operation. When you need the sorted sequence for iteration, you do the same. When you remove elements, you compare the top element in the heap with the last element in the sorted sequence and remove that (which is why you sort it backwards, so that you can pop_back instead of pop_front).
For reference, this idea is loosely based on sequence heaps.
Have you tried messing around with std::vector? As weird as it may sound it could be actually pretty fast because it uses continuous memory. If I remember correctly Bjarne Stroustrup was talking about this at Going Native 2012 (http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style but I'm not 100% sure that it's in this video).
You save time with the binary search, but the insertion in random positions of the deque is slow. I would suggest an std::map instead.
From your edit, it sounds like the delay is in copying - is it a complex object? Can you heap allocate and store pointers in the structure so each entry is created once only; you'll need to provide a custom comparitor that takes pointers, as the objects operator<() wouldn't be called. (The custom comparitor can simply call operator<())
EDIT:
Your own figures show it's the insertion that takes the time, not the 'sorting'. While some of that insertion time is creating a copy of your object, some (possibly most) is creation of the internal structure that will hold your object - and I don't think that will change between list/map/set/queue etc. IF you can predict the likely eventual/maximum size of your data set, and can write or find your own sorting algorithm, and the time is being lost in allocating objects, then vector might be the way to go.

How can I use binary heap in the Dijkstra algorithm?

I am writing code of dijkstra algorithm, for the part where we are supposed to find the node with minimum distance from the currently being used node, I am using a array over there and traversing it fully to figure out the node.
This part can be replaced by binary heap and we can figure out the node in O(1) time, but We also update the distance of the node in further iterations, How will I incorporate that heap?
In case of array, all I have to do is go to the (ith -1) index and update the value of that node, but same thing can't be done in Binary heap, I will have to do the full search to figure out the position of the node and then update it.
What is workaround of this problem?
This is just some information I found while doing this in a class, that I shared with my classmates. I thought I'd make it easier for folks to find it, and I had left this post up so that I could answer it when I found a solution.
Note: I'm assuming for this example that your graph's vertices have an ID to keep track of which is which. This could be a name, a number, whatever, just make sure you change the type in the struct below.
If you have no such means of distinction, then you can use pointers to the vertices and compare their pointed-to addresses.
The problem you are faced with here is the fact that, in Dijkstra's algorithm, we are asked to store the graphs vertices and their keys in this priority queue, then update the keys of the ones left in the queue.
But... Heap data-structures have no way of getting at any particular node that is not the minimum or the last node!
The best we'd be able to do is traverse the heap in O(n) time to find it, then update its key and bubble-it-up, at O(Logn). That makes updating all vertices O(n) for every single edge, making our implementation of Dijkstra O(mn), way worse than the optimal O(mLogn).
Bleh! There has to be a better way!
So, what we need to implement isn't exactly a standard min-heap-based priority queue. We need one more operation than the standard 4 pq operations:
IsEmpty
Add
PopMin
PeekMin
and DecreaseKey
In order to DecreaseKey, we need to:
find a particular vertex inside the Heap
lower its key-value
"heap-up" or "bubble-up" the vertex
Essentially, since you were (I'm assuming it has been implemented sometime in the past 4 months) probably going to use an "array-based" heap implementation,
this means that we need the heap to keep track of each vertex and its index in the array in order for this operation to be possible.
Devising a struct like: (c++)
struct VertLocInHeap
{
int vertex_id;
int index_in_heap;
};
would allow you to keep track of it, but storing those in an array would still give you O(n) time for finding the vertex in the heap. No complexity improvement, and it's more complicated than before. >.<
My suggestion (if optimization is the goal here):
Store this info in a Binary Search Tree whose key value is the `vertex_id`
do a binary-search to find the vertex's location in the Heap in O(Logn)
use the index to access the vertex and update its key in O(1)
bubble-up the vertex in O(Logn)
I actually used a std::map declared as:
std::map m_locations;
in the heap instead of using the struct. The first parameter (Key) is the vertex_id, and the second parameter (Value) is the index in the heap's array.
Since std::map guarantees O(Logn) searches, this works nicely out-of-the-box. Then whenever you insert or bubble, you just m_locations[vertexID] = newLocationInHeap;
Easy money.
Analysis:
Upside: we now have O(Logn) for finding any given vertex in the p-q. For the bubble-up we do O(Log(n)) movements, for each swap doing a O(Log(n)) search in the map of array indexes, resulting in a O(Log^2(n) operation for bubble-up.
So, we have a Log(n) + Log^2(n) = O(Log^2(n)) operation for updating the key values in the Heap for a single edge. That makes our Dijkstra alg take O(mLog^2(n)). That's pretty close to the theoretical optimum, at least as close as I can get it. Awesome Possum!
Downside: We are storing literally twice as much information in-memory for the heap. Is it a "modern" problem? Not really; my desky can store over 8 billion integers, and many modern computers come with at least 8GB of RAM; however, it is still a factor. If you did this implementation with a graph of 4 billion vertices, which can happen a lot more often than you'd think, then it causes a problem. Also, all those extra reads/writes, which may not affect the complexity in analysis, may still take time on some machines, especially if the information is being stored externally.
I hope this helps someone in the future, because I had a devil of a time finding all this information, then piecing the bits I got from here, there, and everywhere together to form this. I'm blaming the internet and lack of sleep.
The problem I ran into with using any form of heap is that, you need to reorder the nodes in the heap. In order to do that, you would have to keep popping everything from the heap until you found the node you need, then change the weight, and push it back in (along with everything else you popped). Honestly, just using an array would probably be more efficient and easier to code than that.
The way I got around this was I used a Red-Black tree (in C++ it's just the set<> data type of the STL). The data structure contained a pair<> element which had a double (cost) and string (node). Because of the tree structure, it is very efficient to access the minimum element (I believe C++ makes it even more efficient by maintaining a pointer to the minimum element).
Along with the tree, I also kept an array of doubles that contained the distance for a given node. So, when I needed to reorder a node in the tree, I simply used the old distance from the dist array along with the node name to find it in the set. I would then remove that element from the tree and re-insert it into the tree with the new distance. To search for a node O(log n) and to insert a node O(log n), so the cost to reorder a node is O(2 * log n) = O(log n). For a binary heap, it also has a O(log n) for both insert and delete (and doesn't support search). So with the cost of deleting all of the nodes until you find the node you want, change its weight, then insert all nodes back in. Once the node has been reordered, I would then change the distance in the array to reflect the new distance.
I honestly can't think of a way to modify a heap in such a way to allow it to dynamically change the weights of a node, because the whole structure of the heap is based on the weights the nodes maintain.
I would do this using a hash table in addition to the Min-Heap array.
The hash table has keys that are hash coded to be the node objects and values that are the indices of where those nodes are in the min-heap arrray.
Then anytime you move something in the min-heap you just need to update the hash table accordingly. Since at most 2 elements will be moved per operation in the min-heap (that is they are exchanged), and our cost per move is O(1) to update the hash table, then we will not have damaged the asymptotic bound of the min-heap operations. For example, minHeapify is O(lgn). We just added 2 O(1) hash table operations per minHeapify operation. Therefore the overall complexity is still O(lgn).
Keep in mind you would need to modify any method that moves your nodes in the min-heap to do this tracking! For example, minHeapify() requires a modification that looks like this using Java:
Nodes[] nodes;
Map<Node, int> indexMap = new HashMap<>();
private minHeapify(Node[] nodes,int i) {
int smallest;
l = 2*i; // left child index
r = 2*i + 1; // right child index
if(l <= heapSize && nodes[l].getTime() < nodes[i].getTime()) {
smallest = l;
}
else {
smallest = i;
}
if(r <= heapSize && nodes[r].getTime() < nodes[smallest].getTime()) {
smallest = r;
}
if(smallest != i) {
temp = nodes[smallest];
nodes[smallest] = nodes[i];
nodes[i] = temp;
indexMap.put(nodes[smallest],i); // Added index tracking in O(1)
indexMap.put(nodes[i], smallest); // Added index tracking in O(1)
minHeapify(nodes,smallest);
}
}
buildMinHeap, heapExtract should be dependent on minHeapify, so that one is mostly fixed, but you do need the extracted key to be removed from the hash table as well. You'd also need to modify decreaseKey to track these changes as well. Once that's fixed then insert should also be fixed since it should be using the decreaseKey method. That should cover all your bases and you will not have altered the asymptotic bounds of your algorithm and you still get to keep using a heap for your priority queue.
Note that a Fibonacci Min Heap is actually preferred to a standard Min Heap in this implementation, but that's a totally different can of worms.
Another solution is "lazy deletion". Instead of decrease key operation you simply insert the node once again to heap with new priority. So, in the heap there will be another copy of node. But, that node will be higher in the heap than any previous copy. Then when getting next minimum node you can simply check if node is already being accepted. If it is, then simply omit the loop and continue (lazy deletion).
This has a little worse performance/higher memory usage due to copies inside the heap. But, it is still limited (to number of connections) and may be faster than other implementations for some problem sizes.
This algorithm: http://algs4.cs.princeton.edu/44sp/DijkstraSP.java.html works around this problem by using "indexed heap": http://algs4.cs.princeton.edu/24pq/IndexMinPQ.java.html which essentially maintains the list of mappings from key to array index.
I believe the main difficulty is being able to achieve O(log n) time complexity when we have to update vertex distance. Here are the steps on how you could do that:
For heap implementation, you could use an array.
For indexing, use a Hash Map, with Vertex number as the key and its index in heap as the value.
When we want to update a vertex, search its index in the Hash Map in O(1) time.
Reduce the vertex distance in heap and then keep traversing up (Check its new distance against its root, if root's value is greater swap root and current vertex). This step would also take O(log n).
Update the vertex's index in Hash Map as you make changes while traversing up the heap.
I think this should work and the overall time complexity would be O((E+V)*log V), just as the theory implies.
I am using the following approach. Whenever I insert something into the heap I pass a pointer to an integer (this memory location is ownned by me, not the heap) which should contain the position of the element in the array managed by the heap. So if the sequence of elements in the heap is rearranged it is supposed to update the values pointed to by these pointers.
So for the Dijkstra algirithm I am creating a posInHeap array of sizeN.
Hopefully, the code will make it more clear.
template <typename T, class Comparison = std::less<T>> class cTrackingHeap
{
public:
cTrackingHeap(Comparison c) : m_c(c), m_v() {}
cTrackingHeap(const cTrackingHeap&) = delete;
cTrackingHeap& operator=(const cTrackingHeap&) = delete;
void DecreaseVal(size_t pos, const T& newValue)
{
m_v[pos].first = newValue;
while (pos > 0)
{
size_t iPar = (pos - 1) / 2;
if (newValue < m_v[iPar].first)
{
swap(m_v[pos], m_v[iPar]);
*m_v[pos].second = pos;
*m_v[iPar].second = iPar;
pos = iPar;
}
else
break;
}
}
void Delete(size_t pos)
{
*(m_v[pos].second) = numeric_limits<size_t>::max();// indicate that the element is no longer in the heap
m_v[pos] = m_v.back();
m_v.resize(m_v.size() - 1);
if (pos == m_v.size())
return;
*(m_v[pos].second) = pos;
bool makingProgress = true;
while (makingProgress)
{
makingProgress = false;
size_t exchangeWith = pos;
if (2 * pos + 1 < m_v.size() && m_c(m_v[2 * pos + 1].first, m_v[pos].first))
exchangeWith = 2 * pos + 1;
if (2 * pos + 2 < m_v.size() && m_c(m_v[2 * pos + 2].first, m_v[exchangeWith].first))
exchangeWith = 2 * pos + 2;
if (pos > 0 && m_c(m_v[pos].first, m_v[(pos - 1) / 2].first))
exchangeWith = (pos - 1) / 2;
if (exchangeWith != pos)
{
makingProgress = true;
swap(m_v[pos], m_v[exchangeWith]);
*m_v[pos].second = pos;
*m_v[exchangeWith].second = exchangeWith;
pos = exchangeWith;
}
}
}
void Insert(const T& value, size_t* posTracker)
{
m_v.push_back(make_pair(value, posTracker));
*posTracker = m_v.size() - 1;
size_t pos = m_v.size() - 1;
bool makingProgress = true;
while (makingProgress)
{
makingProgress = false;
if (pos > 0 && m_c(m_v[pos].first, m_v[(pos - 1) / 2].first))
{
makingProgress = true;
swap(m_v[pos], m_v[(pos - 1) / 2]);
*m_v[pos].second = pos;
*m_v[(pos - 1) / 2].second = (pos - 1) / 2;
pos = (pos - 1) / 2;
}
}
}
const T& GetMin() const
{
return m_v[0].first;
}
const T& Get(size_t i) const
{
return m_v[i].first;
}
size_t GetSize() const
{
return m_v.size();
}
private:
Comparison m_c;
vector< pair<T, size_t*> > m_v;
};

optimize a small loop of if-else branch in c++

Is it possible to remove the branches in the following loop. All iterators are from the container type std::map<type_name, T>
record_iterator beginIter = lastLookup_;
record_iterator endIter = lastLookup_;
++endIter;
for(;endIter != end(); ++beginIter, ++endIter){
time_type now = beginIter->first;
if(ts == now){
lastLookup_ = beginIter;
return beginIter;
}else if(ts > now && ts <= endIter->first){
lastLookup_ = beginIter;
return endIter;
}
}
The problem that this algo is trying to solve is to optimize the forward lookup which location is assumed to be the same or (not too far ) forward of the last looked up location. Ideally, I kept an iterator of last looked up location, and move forward linearly. But this seems to have the same performance as,
record_iterator it= sliceMap_.find(ts);
if(it !=end()){
return it;
}else{
return sliceMap_.upper_bound(ts);
}
I feel that the problem is the branch, so it is possible to remove the branch in this code so I can profile the different in speed?
There are three big problems with the first approach:
Too many comparisons inside a loop.
Using iterators on a std::map involves using std::map<>::iterator::operator++(), which is not exactly fast. Look at the implementation starting at line 62: http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-3.4/tree_8cc-source.html .
Using iterators on a std::map is a linear search. Searching on a map should be logarithmic.
There's also a problem with your second approach. You are searching twice.
Why don't you just use
return sliceMap_.lower_bound(ts);
This should do exactly what you want with one logarithmic search.
As some people said, the first method doesn't make a lot of sense since you are doing a linear search on an ordered container. I realize the location is supposed to be near lastLookup
About the second method, I think a simple optimization would be eliminating the second lookup. You are doing one on record_iterator it= sliceMap_.find(ts); and another one on return sliceMap_.upper_bound(ts);
EDITED:
Try out doing it this way:
record_iterator it = sliceMap_.lower_bound(ts);
return it;
What we are doing there is, using lower_bound() to find the first element whose key doesn't compare less than ts (that includes an equal element which upper_bound() doesn't do).