Boost heap handle to top element - c++

I have an algorithm which essentially takes elements of type T, pushes them to a priority_queue, and then takes the top element of the queue, modifies it, updates the heap, takes the (possibly new) top element, and so on.
I'd like to try the Boost heap library to replace std::priority_queue to avoid having to update by a pop followed by a push of the same element. However, I cannot seem to find a way to access a handle to the top of the queue, to update it.
Specifically there seems to be only two ways to get a handle to an element:
s_handle_from_iterator(it). However, the iterator returned by begin() is not necessarily the top of the stack, and this method does not take ordered_iterator, so it seems I'm out of luck?
push(elem) returns the handle. However, using that seems to mean that I have to store the handle somewhere, probably in the element itself, which seems like a waste of space (and time, when handling the elements).
Is there a better way?
Note that for a slightly different use case (iterating the whole heap in order) this answer seems to suggest that there really isn't a better way...

Related

Removing from the beginning of an std::vector in C++

I might be missing something very basic here but here is what I was wondering -
We know removing an element from the beginning of an std::vector ( vector[0] ) in C++ is an O(n) operation because all the other elements have to be shifted one place backwards.
But why isn't it implemented such that the pointer to the first element is moved one position ahead so that now the vector starts from the second element and, in essence, the first element is removed? This would be an O(1) operation.
std::array and C-style arrays are fixed-length, and you can't change their length at all, so I think you're having a typo there and mean std::vector instead.
"Why was it done that way?" is a bit of a historical question. Perspectively, if your system library allowed for giving back unused memory to the operating system, your "shift backwards" trick would disallow any reuse of the former first elements' memory later on.
Also, std::vector comes from systems (like they are still basically used in every operating system) with calls like free and malloc, where you need to keep the pointer to the beginning of an allocated region around, to be able to free it later on. Hence, you'd have to lug around another pointer in the std::vector structure to be able to free the vector, and that would only be a useful thing if someone deleted from the front. And if you're deleting from the front, chances are you might be better off using a reversed vector (and delete from the end), or a vector isn't the right data structure alltogether.
It is not impossible for a vector to be implemented like that (it wouldn't be std::vector though). You need to keep the pointer to first element in addition to a pointer to the underlying array (alternatively some offset can be stored, but no matter how you put it you need to store more data in the vector).
Consider that this is useful only for one quite specific use-case: Erasing the first element. Well, once you got that you can also benefit while inserting an element in the front when there is free space left. If there is free space left then even inserting in the first half could benefit by shifting only the first half.
However, all this does not fit with the concept of capacity. With std::vector you know exactly how many elements you can add before a reallocation occurs: capcity() - size(). With your proposal this wouldn't hold any more. Erasing the first element would affect capacity in an odd way. It would complicate the interface and usages of vectors for all use cases.
Further, erasing elements anywhere else would still not be O(1). In total it would incur a cost and add complexity for any use of the vector, while bringing an advantage only in a very narrow use case.
If you do find yourself in the situation that you need to erase the front element very often, then you can either store the vector in reverse, and erasing the last element is already O(1), or use a different container.

Most efficient way for popping max node from a queue implemented using a linked list? C++

If you have a fifo queue implemented using a linked list, what would be the most efficient way to pop a node with the highest value?
Mergesort would be O(n log n).
Scanning through the list would be O(n).
Can anyone suggest more efficient ways of doing this?
The queue must retain the fifo ordering that operates in the usual manner with enqueue and dequeue, but has an extra method, such as popMax, which pops and returns the node with the highest value.
No code is needed, just some ideas! Thanks!
Is popMax frequent enough that changing it from O(N) to O(logN) justifies extra storage (per node: two pointers plus an index) and extra complexity AND changing enqueue and dequeue from O(1) to O(logN) ??
In the many times I have solved this problem (for different reasons and different employers) the answer to the above has pretty consistently been "Yes". But it might be "no" for you. So first make that decision.
Any improvement on the O(N) needs to be able to remove from the middle of the primary sequence. If the primary sequence would have been a forward only linked list, now it needs links in both directions: One extra pointer.
A heap of pointers, costs another extra pointer (per node, but not in the node). But then dequeue needs to be able to remove from the middle of the heap, which takes an index within the node as a back pointer to its position in the heap.
If it is worth all that, you can easily find (online free) source code for a templated version of a priorityQueue/heap and it should be obvious how to make the heap, objects be node* and the less function given to the heap compare the values inside the nodes pointed to.
Next you change that heap source code (the reason you don't simply use std::priority_queue) such that each time it positions an "object" (meaning a node*) in the queue, it does some kind of callback to notify the object of its new index.
You also need to expose some internals of the heap code. There is a point in any decent version of heap code in which the code deals with a hole (missing element) at index x within the heap, by checking whether the last element of the heap could be correctly moved there (and if so doing that) or if not moving the correct child of the hole into the hole and repeating for the new hole where that child was. Typically that code is not exposed to external callers with x as an input. But it easily can be exposed. Your dequeue function needs that in order to remove from the heap the same element being removed from the list.
For less extra storage, but likely more execution time (though still O(logN) for each function). You could have a heap of nodes instead of a heap of node*. That is the reason in coding this kind of heap, you should code the notify callback generically (similar to less). Then your doubly linked list has indexes instead of pointers (so growth is robust) and the notify function updates the predecessor's forward index and successor's back index. To avoid lots of special casing, you need to have a full circle (maybe including a dummy node) of double links, rather than just end to end.
I haven't worked through the details yet myself (since I've never redone any of this in the post C++11 world) but I think a more elegant alternative to the notify function discussed above would be to wrap the object (that will be in the heap) in a wrapper that allows it to be moved but doesn't allow it to be copied. The action to be done by notify would instead be done during the move. That makes std::priority_queue even closer to what you need, but so far as I understand, it still doesn't expose that key internal point in the code for filling a hole at an arbitrary location.

Shrinking a std::priority_queue

Given a std::priority_queue to which elements are being added faster than they are being removed by the usual process of repeatedly popping the best element, so that the program is going to run out of memory unless something is done,
Is there any way to throw away the worst half of the elements, while leaving the best half to be processed one at a time as normal?
There isn't a direct way. But a binary heap doesn't really support that operation anyways.
But it's not hard to indirectly do so:
Create a temporary empty priority queue
Swap the main and temporary queues
Enter a loop that pops from the temporary and pushes to the main
Stop when you're happy with the number of copied elements
Destroy the temporary queue.
Clearly not, since the interface to a std::priority_queue is so extremely limited. You could implement your own priority queue that will let you do this using make_heap, push_heap and pop_heap (this is how std::priority_queue is implemented) and implementing your own function to remove the worst half of the elements.
The std::priority_queue is a 2-heap and as such only partially ordered. The data-structure is not useful to locate the best half of the elements differently than extracting them.

Is there an equivalent of vector::reserve() for an std::list?

I have a class that looks like this:
typedef std::list<char*> PtrList;
class Foo
{
public:
void DoStuff();
private:
PtrList m_list;
PtrList::iterator m_it;
};
The function DoStuff() basically adds elements to m_list or erases elements from it, finds an iterator to some special element in it and stores it in m_it. It is important to note that each value of m_it is used in every following call of DoStuff().
So what's the problem?
Everything works, except that profiling shows that the operator new is invoked too much due to list::push_back() called from DoStuff().
To increase performance I want to preallocate memory for m_list in the initialization of Foo as I would do if it were an std::vector. The problem is that this would introduce new problems such as:
Less efficient insert and erase of elements.
m_it becomes invalid as soon as the vector is changed from one call to DoStuff() to the next. EDIT: Alan Stokes suggested to use an index instead of an iterator, solving this issue.
My solution: the simplest solution I could think of is to implement a pool of objects that also has a linked-list functionality. This way I get a linked list and can preallocate memory for it.
Am I missing something or is it really the simplest solution? I'd rather not "re-invent the wheel", and use a standard solution instead, if it exists.
Any thoughts, workarounds or enlightening comments would be appreciated!
I think you are using wrong the container.
If you want fast push back then don't automatically assume that you need a linked list, a linked list is a slow container, it is basically suitable for reordering.
A better container is a std::deque. A deque is basically a array of arrays. It allocates a block of memory and occupies it when you push back, when it runs out it will allocate another block. This means that it only allocates very infrequently and you do not have to know the size of the container ahead of time for efficiency like std::vector and reserver.
You can use the splice function in std::list to implement a pool. Add a new member variable PtrList m_Pool. When you want to add a new object and the pool is not empty, assign the value to the first element in the pool and then splice it into the list. To erase an element, splice it from the list to the pool.
But if you don't care about the order of the elements, then a deque can be much faster. If you want to erase an element in the middle, copy the last element onto the element you want to delete, then erase the last element.
My advice is the same as 111111's, try switching to deque before you write any significant code.
However, to directly answer your question: you could use std::list with a custom allocator. It's a bit fiddly, and I'm not going to work through all the details here, but the gist of it is that you write a class that represents the memory allocation strategy for list nodes. The nodes allocated by list will be a small implementation-defined amount larger than char*, but they will all be the same size, which means you can write an optimized allocator just for that size (a pool of memory blocks rather than a pool of objects), and you can add functions to it that let you reserve whatever space you want in the allocator, at the time you want. Then the list can allocate/free quickly. This saves you needing to re-implement any of the actual list functionality.
If you were (for some reason) going to implement a pool of objects with list functionality, then you could start with boost::intrusive. That might also be useful when writing your own allocator, for keeping track of your list of free blocks.
List and vector are completely different in the way they manage objects.
Vector constructs elements in place into a allocated buffer of a given capacity. New allocation happens when the capacity is exhausted.
List allocate elements one by one, each into an individually allocated space.
Vector elements shift when something is inserted / removed, hence, vector indexes and element addresses are not stable.
List element are re-linked when something is inserted / removed, hence, list iterators and elements addresses are stable.
A way to make a list to behave similarly to a vector, is to replace the default allocator (that allocates form the system every time is invoked) with another one the allocates objects in larger chunks, dispatching sub-chunks to the list when it invokes it.
This is not something the standard library provides by default.
Could potentially use list::get_allocator().allocate(). Afaik, default behaviour would be to acquire memory as and when due to the non-contiguous nature of lists - hence the lack of reserve() - but no major drawbacks with using the allocator method occur to me immediately. Provided you have a non-critical section in your program, at the start or whatever, you can at least choose to take the damage at that point.

C++ smart linear container

Let me explain my problem together with the background, so it would be easier to understand why I'm asking for this specific type of thing. I'm developing an instant messenger. Most of the architecture is outlined by my teacher, however implementation detail may vary. There is an "Engine" class, EventManager, which registers clients. To identify them and to easily remove them, I use a map (with client-id's) or a set with pointers. So far, so good. But then this EventManager uses poll() (or select(), but that's nowhere as comfortable to use as poll(), as you have to rebuild the array each time, which is slow and not-so-nice, I guess, and I can restrict myself to UNIX environment, if you ask) in its main loop. Which needs an array of struct pollfd. Now every time a new client comes or goes, this array needs to be rebuilt. Either I use a dynamic array by hand and allocate memory every time (baaaaaad), or I use a vector, which would handle new client's struct pollfd insertion pretty well at the end of the container, or a deque, which would insert and remove anywhere pretty well. Now my two questions are:
If I choose vector, will it automatically shrink and move elements in the middle of itself instead of full reallocation? and
That would anyway copy a lot, if it's in the beginning, so I'd like to use deque. Does that have an array interface (like you would do with vector - &myVector[0]) or is it strictly non-contiguous?
If you remove something from the middle of a vector it will move all the following elements one position towards the beginning. It will not reallocate. You don't have to consider reallocations at all because they are amortized to give O(1) time per insertion.
deque is not much better than vector. Removing from the beginning or end is efficient. Not from the middle. If you remove from anywhere, then it will hopefully be twice as fast a vector, but not faster. Since it's a more complicated structure it'll probably be even slower. deque doesn't guarantee continuous storage, so although indexing is allowed and done in O(1) time, you still can't reliably convert it to a pointer.
Anyway it smells like premature optimization. Use vector. Since the order of clients is not significant, you can speed up the erasure of clients by swapping the element that you want to remove with the last element in the vector and calling pop_back() after that.