Difference between priority queue and a heap - c++

It seems that a priority queue is just a heap with normal queue operations like insert, delete, top, etc. Is this the correct way to interpret a priority queue? I know you can build priority queues in different ways but if I were to build a priority queue from a heap is it necessary to create a priority queue class and give instructions for building a heap and the queue operations or is it not really necessary to build the class?
What I mean is if I have a function to build a heap and functions to do operations like insert and delete, do I need to put all these functions in a class or can I just use the instructions by calling them in main.
I guess my question is whether having a collection of functions is equivalent to storing them in some class and using them through a class or just using the functions themselves.
What I have below is all the methods for a priority queue implementation. Is this sufficient to call it an implementation or do I need to put it in a designated priority queue class?
#ifndef MAX_PRIORITYQ_H
#define MAX_PRIORITYQ_H
#include <iostream>
#include <deque>
#include "print.h"
#include "random.h"
int parent(int i)
{
return (i - 1) / 2;
}
int left(int i)
{
if(i == 0)
return 1;
else
return 2*i;
}
int right(int i)
{
if(i == 0)
return 2;
else
return 2*i + 1;
}
void max_heapify(std::deque<int> &A, int i, int heapsize)
{
int largest;
int l = left(i);
int r = right(i);
if(l <= heapsize && A[l] > A[i])
largest = l;
else
largest = i;
if(r <= heapsize && A[r] > A[largest])
largest = r;
if(largest != i) {
exchange(A, i, largest);
max_heapify(A, largest, heapsize);
//int j = max_heapify(A, largest, heapsize);
//return j;
}
//return i;
}
void build_max_heap(std::deque<int> &A)
{
int heapsize = A.size() - 1;
for(int i = (A.size() - 1) / 2; i >= 0; i--)
max_heapify(A, i, heapsize);
}
int heap_maximum(std::deque<int> &A)
{
return A[0];
}
int heap_extract_max(std::deque<int> &A, int heapsize)
{
if(heapsize < 0)
throw std::out_of_range("heap underflow");
int max = A.front();
//std::cout << "heapsize : " << heapsize << std::endl;
A[0] = A[--heapsize];
A.pop_back();
max_heapify(A, 0, heapsize);
//int i = max_heapify(A, 0, heapsize);
//A.erase(A.begin() + i);
return max;
}
void heap_increase_key(std::deque<int> &A, int i, int key)
{
if(key < A[i])
std::cerr << "New key is smaller than current key" << std::endl;
else {
A[i] = key;
while(i > 1 && A[parent(i)] < A[i]) {
exchange(A, i, parent(i));
i = parent(i);
}
}
}
void max_heap_insert(std::deque<int> &A, int key)
{
int heapsize = A.size();
A[heapsize] = std::numeric_limits<int>::min();
heap_increase_key(A, heapsize, key);
}

A priority queue is an abstract datatype. It is a shorthand way of describing a particular interface and behavior, and says nothing about the underlying implementation.
A heap is a data structure. It is a name for a particular way of storing data that makes certain operations very efficient.
It just so happens that a heap is a very good data structure to implement a priority queue, because the operations which are made efficient by the heap data strucure are the operations that the priority queue interface needs.

Having a class with exactly the interface you need (just insert and pop-max?) has its advantages.
You can exchange the implementation (list instead of heap, for example) later.
Someone reading the code that uses the queue doesn't need to understand the more difficult interface of the heap data structure.
I guess my question is whether having a collection of functions is
equivalent to storing them in some class and using them through a
class or just using the functions themselves.
It's mostly equivalent if you just think in terms of "how does my program behave". But it's not equivalent in terms of "how easy is my program to understand by a human reader"

The term priority queue refers to the general data structure useful to order priorities of its element. There are multiple ways to achieve that, e.g., various ordered tree structures (e.g., a splay tree works reasonably well) as well as various heaps, e.g., d-heaps or Fibonacci heaps. Conceptually, a heap is a tree structure where the weight of every node is not less than the weight of any node in the subtree routed at that node.

The C++ Standard Template Library provides the make_heap, push_heap
and pop_heap algorithms for heaps (usually implemented as binary
heaps), which operate on arbitrary random access iterators. It treats
the iterators as a reference to an array, and uses the array-to-heap
conversion. It also provides the container adaptor priority_queue,
which wraps these facilities in a container-like class. However, there
is no standard support for the decrease/increase-key operation.
priority_queue referes to abstract data type defined entirely by the operations that may be performed on it. In C++ STL prioroty_queue is thus one of the sequence adapters - adaptors of basic containers (vector, list and deque are basic because they cannot be built from each other without loss of efficiency), defined in <queue> header (<bits/stl_queue.h> in my case actually). As can be seen from its definition, (as Bjarne Stroustrup says):
container adapter provides a restricted interface to a container. In
particular, adapters do not provide iterators; they are intended to be
used only through their specialized interfaces.
On my implementation prioroty_queue is described as
/**
* #brief A standard container automatically sorting its contents.
*
* #ingroup sequences
*
* This is not a true container, but an #e adaptor. It holds
* another container, and provides a wrapper interface to that
* container. The wrapper is what enforces priority-based sorting
* and %queue behavior. Very few of the standard container/sequence
* interface requirements are met (e.g., iterators).
*
* The second template parameter defines the type of the underlying
* sequence/container. It defaults to std::vector, but it can be
* any type that supports #c front(), #c push_back, #c pop_back,
* and random-access iterators, such as std::deque or an
* appropriate user-defined type.
*
* The third template parameter supplies the means of making
* priority comparisons. It defaults to #c less<value_type> but
* can be anything defining a strict weak ordering.
*
* Members not found in "normal" containers are #c container_type,
* which is a typedef for the second Sequence parameter, and #c
* push, #c pop, and #c top, which are standard %queue operations.
* #note No equality/comparison operators are provided for
* %priority_queue.
* #note Sorting of the elements takes place as they are added to,
* and removed from, the %priority_queue using the
* %priority_queue's member functions. If you access the elements
* by other means, and change their data such that the sorting
* order would be different, the %priority_queue will not re-sort
* the elements for you. (How could it know to do so?)
template:
template<typename _Tp, typename _Sequence = vector<_Tp>,
typename _Compare = less<typename _Sequence::value_type> >
class priority_queue
{
In opposite to this, heap describes how its elements are being fetched and stored in memory. It is a (tree based) data structure, others are i.e array, hash table, struct, union, set..., that in addition satisfies heap property: all nodes are either [greater than or equal to] or [less than or equal to] each of its children, according to a comparison predicate defined for the heap.
So in my heap header I find no heap container, but rather a set of algorithms
/**
* #defgroup heap_algorithms Heap Algorithms
* #ingroup sorting_algorithms
*/
like:
__is_heap_until
__is_heap
__push_heap
__adjust_heap
__pop_heap
make_heap
sort_heap
all of them (excluding __is_heap, commented as "This function is an extension, not part of the C++ standard") described as
* #ingroup heap_algorithms
*
* This operation... (what it does)

Not really. The "priority" in the name stems from a priority value for the entries in the queue, defining their ... of course: priority. There are many ways to implement such a PQ, however.

A priority queue is an abstract data structure that can be implemented in many ways-unsorted array,sorted array,heap-. It is like an interface, it gives you the signature of heap:
class PriorityQueue {
top() → element
peek() → element
insert(element, priority)
remove(element)
update(element, newPriority)
size() → int
}
A heap is a concrete implementation of the priority queue using an array (it can conceptually be represented as a particular kind of binary tree) to hold elements and specific algorithms to enforce invariants. Invariants are internal properties that always hold true throughout the life of the data structure.
here is the performance comparison of priority queue implementions:

Related

How to fill a stl::map by iterating to maximum value?

I have an std::map, and I use the following method for filling up to the maximum value of the supplied data type. In this case, if K is int then the maximum value is 2,147,483,647. I want my map to have 2,147,483,647 keys with the same value.
The below loop is very inefficient. Is there any method to reduce the time consumption?
for (auto i = keyBegin; i!=numeric_limits<K>::max(); i++) {
m_map[i] = val;
}
The problem with the code above is that you're inserting 2 billion numbers, all at the end of the map. But operator[] has no idea that you'll be inserting a new item there!
std::map::insert(hint_before, value) is what you need. You've got a perfect hint - all values will be inserted directly before m_map.end()
To supplement the existing answers, this is really not a good use of std::map.
Maps are designed for quick lookup in a collection of keys & values, where the keys are "sparse". They're generally implemented as trees, requiring lots of dynamic allocation, tree rebalancing, and the sacrifice of cache locality. This is worth it for the general map use case.
But your keys are far from sparse! You literally have a value for every possible number in the key's type's range. This is what arrays are for.
Use an array and you will benefit from cache, you will benefit from constant-time lookups, and you will not need any dynamic allocation inside the container. You will of course need to dynamically allocate the container itself because it is huge, so, you're looking for std::vector.
And that's only if you really need to precalculate all these values. If you don't necessarily need them all multiple times, consider generating them on-demand instead. Because, regardless how much RAM a contemporary PC can provide you, this feels a bit like an abuse of the technology.
To supplement MSalters's answer, that suggests to use map::insert(hint, {key, value}) construct, I suggest to use a non-default allocator. A specialized allocater can further speed up the insertion twofold. Consider the following trivial allocator:
template <class T>
class chunk_allocator
{
private:
struct node
{
union
{
node *next;
char element[sizeof(T)];
};
};
public:
using value_type = T;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using is_always_equal = std::false_type;
using propagate_on_container_move_assignment = std::true_type;
T*allocate(std::size_t n)
{
if (n > 1)
return reinterpret_cast<T*>(::operator new(sizeof(T) * n));
if (!m_free_head) populate();
node * head = m_free_head;
m_free_head = head->next;
using node_ptr = node*;
head->next.~node_ptr();
return reinterpret_cast<T*>(&head->element);
}
void deallocate(T* p, std::size_t n) {
if (!p)
return;
if (n > 1) {
::operator delete((void*)p);
return;
}
node * new_head = new ((void*)p) node;
new_head->next = m_free_head;
m_free_head = new_head->next;
}
private:
static constexpr unsigned NODES_IN_CHUNK = 1000;
void populate()
{
if (m_free_head) return;
m_chunks.emplace_back();
for (node & entry : m_chunks.back()) {
entry.next = m_free_head;
m_free_head = &entry;
}
}
std::list<std::array<node, NODES_IN_CHUNK>> m_chunks;
node * m_free_head = nullptr;
};
template< class T1, class T2 >
bool operator==( const chunk_allocator<T1>& a, const chunk_allocator<T2>& b )
{
return (void*)&a == (void*)&b;
}
template< class T1, class T2 >
bool operator!=( const chunk_allocator<T1>& a, const chunk_allocator<T2>& b )
{
return !(a == b);
}
And its use:
std::map<int, int, std::less<int>,
chunk_allocator<std::pair<const int, int >>> m_map;
Working 100,000,000 elements takes (on windows with cygwin):
std::allocator: Insertion: 7.987, destruction: 7.238
chunk_allocator: Insertion: 2.745, destruction: 1.357
On Linux the differences are not that big, but still 2x improvements are possible.
Bonus points - the chunk_allocator takes less memory, since it does not use operator new for individual map nodes. Each call to operator new has to maintain memory management bookkeeping, which chunk_allocator does not have to.

Does constant time access imply contiguous memory at some point?

As the title says, I was wondering if constant-time/O(1) access to a container does imply that memory is necessarily contiguous at some point.
When I say contiguous I mean if pointers can be compared with relational operators at some point without invoking undefined behavior.
Take eg std::deque: it does not guarantee that all its elements are stored contiguously (i.e in the same memory array), but is it correct to say that as std::deque satisfy the requirements of a Random Access Iterator, memory will be contiguous at some point independently of the implementation?
I am new to C++ so in case what I said above does not make sense: suppose I was going to implement random access iterators in C. Can
typedef struct random_access_iterator {
void *pointer; /*pointer to element */
void *block; /* pointer to the memory array
* so in case of comparing iterators
* where pointer does not point to addresses in the same array
* it can be used to comparisons instead*/
/* implement other stuff */
} RandomIter;
be used to generically express a similar mechanism to that of C++ random access iterators (considering that even if pointer do not, block will always
point to addresses in the same memory array in iterators of the same container)?
EDIT: just to clarify, constant-time here is used to denote constant-time random access
No. Consider a fixed-size linked list like the following:
struct DumbCounterexample
{
struct node
{
std::vector<int> items;
std::unique_ptr<node> next;
};
std::unique_ptr<node> first;
size_t size;
static constexpr size_t NODE_COUNT = 10;
DumbCounterexample()
: first{new node},
size{0}
{
node* to_fill = first.get();
for (size_t i = 0; i < NODE_COUNT - 1; ++i) {
to_fill->next.reset(new node);
to_fill = to_fill->next.get();
}
}
int& operator[](size_t i)
{
size_t node_num = i % NODE_COUNT;
size_t item_num = i / NODE_COUNT;
node* target_node = first.get();
for (size_t n = 0; n < node_num; ++n) {
target_node = target_node->next.get();
}
return target_node->items[item_num];
}
void push_back(int i)
{
size_t node_num = size % NODE_COUNT;
node* target_node = first.get();
for (size_t n = 0; n < node_num; ++n) {
target_node = target_node->next.get();
}
target_node->items.push_back(i);
++size;
}
};
Lookup time is constant. It does not depend on the number of elements stored in the container (only the constant NODE_COUNT).
Now, this is a strange data structure, and I can't think of any legitimate reason to use it, but it does serve as a counterexample to the claim that there need be a single contiguous block of memory that would be shared by all iterators to elements in the container (i.e. the block pointer in your example random_access_iterator struct).

Indirect priority queue implementation

I am reading indirect priority queues in Robert Sedgewick's Algorithms in C++.
The implementation below maintains pq as an array of indices into some client array. For example, if the client defines operator< for arguments of type Index, then, when fixUp compares pq[j] with pq[k], it is comparing data.grade[pq[j]] with data.grade[pq[k]], as desired. We assume that Index is a wrapper class whose object can index arrays, so that we can keep the heap position corresponding to index value k in qp[k], which allows us to implement "change priority" and "remove". We maintain the invariant pq[qp[k]]=qp[pq[k]]=k for all k in the heap.
template <class Index>
class PQ
{
private:
int N; Index* pq; int* qp;
void exch(Index i, Index j)
{
int t;
t = qp[i]; qp[i] = qp[j]; qp[j] = t;
pq[qp[i]] = i; pq[qp[j]] = j;
}
void fixUp(Index a[], int k);
void fixDown(Index a[], int k, int N);
public:
PQ(int maxN)
{
pq = new Index[maxN+1];
qp = new int[maxN+1]; N = 0;
}
int empty() const { return N == 0; }
void insert(Index v) { pq[++N] = v; qp[v] = N; fixUp(pq, N); }
Index getmax()
{
exch(pq[1], pq[N]);
fixDown(pq, 1, N-1);
return pq[N--];
}
void change(Index k)
{
fixUp(pq, qp[k]);
fixDown(pq, qp[k], N);
}
};
The main disadvantage of using indirection in this way is the extra space used. The size of the index arrays has to be the size of the data array, when the maximum size of the priority queue could be much less. Another approach to building a priority queue on top of existing data in an array is to have the client program make records consisting of a key with its array index as associated information, or to use an index key with a client-supplied overloaded operator<. Then, if the implementation uses a linked-allocation representation. Then the space used by the priority queue would be proportional to the maximum number of elements on the queue at any one time. Such approaches would be preferred over Program 9.12 if space must be conserved and if the priority queue involves only a small fraction of the data array.
What does the author mean by the below?
We assume that Index is a wrapper class whose object can index arrays
In improvement, the author is suggesting:
Another approach to building a priority queue on top of existing data in an array is to have the client program make records consisting of a key with its array index as associated information, or to use an index key with a client-supplied overloaded operator<. Then, if the implementation uses a linked-allocation representation
What does author mean here in the two points he is suggesting?

Set insert doing a weird number of comparisons

I am unable to explain the number of comparisons that std::set does while inserting a new element. Here is an example:
For this code
struct A {
int i = 0;
bool operator()(int a, int b)
{
++i;
return a < b;
}
};
int main()
{
A a;
set<int, A> s1(a);
s1.insert(1);
cout << s1.key_comp().i << endl;
s1.insert(2);
cout << s1.key_comp().i << endl;
}
The output is
0
3
Why does inserting a second element require 3 comparisons? o_O
This is a side effect of using a red-black tree to implement std::set, which requires more comparisons initially compared to a standard binary tree.
I don't know the particular as they will depend on your std::set implementation, however determining the equality of two items requires two comparisons, as it is based on the fact that not (x < y) and not (y < x) implies x == y.
Depending on how the tree is optimized, you might thus be paying a first comparison to determine whether it should go left or right, and then two comparisons to check whether it's equal or not.
The Standard has no requirement except that the number of comparisons be O(log N) where N is the number of items already in the set. Constant factors are a quality of implementation issue.

Understanding boost::disjoint_sets

I need to use boost::disjoint_sets, but the documentation is unclear to me. Can someone please explain what each template parameter means, and perhaps give a small example code for creating a disjoint_sets?
As per the request, I am using disjoint_sets to implement Tarjan's off-line least common ancestors algorithm, i.e - the value type should be vertex_descriptor.
What I can understand from the documentation :
Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).
You'll need two "properties" :
one to associate an integer to each element (first template argument), the rank
one to associate an element to an other one (second template argument), the fathers
On an example :
std::vector<int> rank (100);
std::vector<int> parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
Arrays are used &rank[0], &parent[0] to the type in the template is int*
For a more complex example (using maps) you can look at Ugo's answer.
You are just giving to the algorithm two structures to store the data (rank/parent) he needs.
disjoint_sets<Rank, Parent, FindCompress>
Rank PropertyMap used to store the size of a set (element -> std::size_t). See union by rank
Parent PropertyMap used to store the parent of an element (element -> element). See Path compression
FindCompress Optional argument defining the find method. Default to find_with_full_path_compression See here (Default should be what you need).
Example:
template <typename Rank, typename Parent>
void algo(Rank& r, Parent& p, std::vector<Element>& elements)
{
boost::disjoint_sets<Rank,Parent> dsets(r, p);
for (std::vector<Element>::iterator e = elements.begin();
e != elements.end(); e++)
dsets.make_set(*e);
...
}
int main()
{
std::vector<Element> elements;
elements.push_back(Element(...));
...
typedef std::map<Element,std::size_t> rank_t; // => order on Element
typedef std::map<Element,Element> parent_t;
rank_t rank_map;
parent_t parent_map;
boost::associative_property_map<rank_t> rank_pmap(rank_map);
boost::associative_property_map<parent_t> parent_pmap(parent_map);
algo(rank_pmap, parent_pmap, elements);
}
Note that "The Boost Property Map Library contains a few adaptors that convert commonly used data-structures that implement a mapping operation, such as builtin arrays (pointers), iterators, and std::map, to have the property map interface"
This list of these adaptors (like boost::associative_property_map) can be found here.
For those of you who can't afford the overhead of std::map (or can't use it because you don't have default constructor in your class), but whose data is not as simple as int, I wrote a guide to a solution using std::vector, which is kind of optimal when you know the total number of elements beforehand.
The guide includes a fully-working sample code that you can download and test on your own.
The solution mentioned there assumes you have control of the class' code so that in particular you can add some attributes. If this is still not possible, you can always add a wrapper around it:
class Wrapper {
UntouchableClass const& mInstance;
size_t dsID;
size_t dsRank;
size_t dsParent;
}
Moreover, if you know the number of elements to be small, there's no need for size_t, in which case you can add some template for the UnsignedInt type and decide in runtime to instantiate it with uint8_t, uint16_t, uint32_tor uint64_t, which you can obtain with <cstdint> in C++11 or with boost::cstdint otherwise.
template <typename UnsignedInt>
class Wrapper {
UntouchableClass const& mInstance;
UnsignedInt dsID;
UnsignedInt dsRank;
UnsignedInt dsParent;
}
Here's the link again in case you missed it: http://janoma.cl/post/using-disjoint-sets-with-a-vector/
I written a simple implementation a while ago. Have a look.
struct DisjointSet {
vector<int> parent;
vector<int> size;
DisjointSet(int maxSize) {
parent.resize(maxSize);
size.resize(maxSize);
for (int i = 0; i < maxSize; i++) {
parent[i] = i;
size[i] = 1;
}
}
int find_set(int v) {
if (v == parent[v])
return v;
return parent[v] = find_set(parent[v]);
}
void union_set(int a, int b) {
a = find_set(a);
b = find_set(b);
if (a != b) {
if (size[a] < size[b])
swap(a, b);
parent[b] = a;
size[a] += size[b];
}
}
};
And the usage goes like this. It's simple. Isn't it?
void solve() {
int n;
cin >> n;
DisjointSet S(n); // Initializing with maximum Size
S.union_set(1, 2);
S.union_set(3, 7);
int parent = S.find_set(1); // root of 1
}
Loic's answer looks good to me, but I needed to initialize the parent so that each element had itself as parent, so I used the iota function to generate an increasing sequence starting from 0.
Using Boost, and I imported bits/stdc++.h and used using namespace std for simplicity.
#include <bits/stdc++.h>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
using namespace std;
int main() {
array<int, 100> rank;
array<int, 100> parent;
iota(parent.begin(), parent.end(), 0);
boost::disjoint_sets<int*, int*> ds(rank.begin(), parent.begin());
ds.union_set(1, 2);
ds.union_set(1, 3);
ds.union_set(1, 4);
cout << ds.find_set(1) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(2) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(3) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(4) << endl; // 1 or 2 or 3 or 4
cout << ds.find_set(5) << endl; // 5
cout << ds.find_set(6) << endl; // 6
}
I changed std::vector to std::array because pushing elements to a vector will make it realloc its data, which makes the references the disjoint sets object contains become invalid.
As far as I know, it's not guaranteed that the parent will be a specific number, so that's why I wrote 1 or 2 or 3 or 4 (it can be any of these). Maybe the documentation explains with more detail which number will be chosen as leader of the set (I haven't studied it).
In my case, the output is:
2
2
2
2
5
6
Seems simple, it can probably be improved to make it more robust (somehow).
Note: std::iota Fills the range [first, last) with sequentially increasing values, starting with value and repetitively evaluating ++value.
More: https://en.cppreference.com/w/cpp/algorithm/iota