separate chaining in hashing - c++

I am reading about hashing in Robert Sedwick book on Algorithms in C++
We might be using a header node to streamline the code for insertion
into an ordered list, but we might not want to use M header nodes for
individual lists in separate chaining. Indeed, we could even eliminate
the M links to the lists by having the first nodes in the lists
comprise the table
.
class ST
{
struct node
{
Item item;
node* next;
node(Item x, node* t)
{ item = x; next = t; }
};
typedef node *link;
private:
link* heads;
int N, M;
Item searchR(link t, Key v)
{
if (t == 0) return nullItem;
if (t->item.key() == v) return t->item;
return searchR(t->next, v);
}
public:
ST(int maxN)
{
N = 0; M = maxN/5;
heads = new link[M];
for (int i = 0; i < M; i++) heads[i] = 0;
}
Item search(Key v)
{ return searchR(heads[hash(v, M)], v); }
void insert(Item item)
{ int i = hash(item.key(), M);
heads[i] = new node(item, heads[i]); N++; }
};
My two questions on above text what does author mean by
"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table." How can we modify above code for this?
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.

"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table."
Consider Node* x[n] vs Node x[n]: the former needs an extra pointer and on-insertion memory allocated for the head Node of every non-empty element, and an extra indirection for every hash table operation, while the latter eliminates the n pointers but requires that any unused elements will be able to be put in some discernable not-in-use state (tracking of which may or may not require extra memory), and if sizeof(Node) size is greater than sizeof(Node*), it may be more wasteful of memory anyway. The difference in memory use can also affect efficiency of cache use: if the table has a high element to buckets ratio then a Node[] gets the Node data into fewer contiguous memory pages, and if you're iterating (in unsorted order) then it's very cache efficient, whereas Node*[] will jump to separate memory allocations that might be all over the place (or on the other hand, might actually be quite close together in some actually useful: e.g. if both access patterns and dynamic memory allocation addresses correlate to chronological time of object creation.
How can we modify above code for this?
First, your existing code has a problem: heads[i] = new node(item, heads[i]); overwrites an entry in the hash table without first checking if it's empty... if there's anything there then you should be adding to the list, not overwriting the array.
The design change discussed needs:
link* heads;
...changed to...
node* head;
You'd initialise it like this:
head = new node[M];
Which needs an extra node constructor (if item has an equivalent default constructor, you can leave out its initialisation below)
node() : item(nullItem), next(nullptr) { }
Then there's some knock on changes to the rest of your code that are easy to work through. Basically, you're getting rid of a layer of pointers.
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.
I didn't write it so can't say authoritatively, but it appears to be saying that when designing the list code, a decision might have been made to have an initial Node even in an empty list, as this simplifies code for several list operations. While the extra data-less Node might seem a reasonable price when contemplating "usual" uses of a list, hash tables are unusual in that you want most of the lists chained of the buckets to have 0 or 1 element, and exponentially fewer should be longer and longer. So, such a list implementation is poorly suited to use in a hash table.

Related

This is strange question of Space Complexity. Can someone provide any insights?

I was solving this question when this approach clicked in -
Given a single linked list and an integer x. Your task is to complete the function deleteAllOccurances() which deletes all occurrences of a key x present in the linked list. The function takes two arguments: the head of the linked list and an integer x. The function should return the head of the modified linked list.
I am not sure what is the space complexity of my code.
I think since I am only using 1 extra Node space and simultaneously creating new nodes and deleting old ones, so it should be O(1).
Node* deleteAllOccurances(Node *head,int x)
{
Node *new_head = new Node(-1);
Node *tail = new_head;
Node *temp = head;
Node *q;
while(temp != NULL) {
if(temp->data != x) {
tail->next = new Node(temp->data);
tail = tail->next;
}
q = temp;
delete q;
temp = temp->next;
}
tail->next = NULL;
return new_head->next;
}
Well, kind of.
It depends on whether you are considering total allocations as a net change (in which case you're right).
But if you are thinking about the amount of times you hit the heap for new allocations, then it's using more space and a ton of computation. (A given C++ compiler and runtime is not obliged to guarantee immediately reusing space freed in the heap, just that it's available for reuse.)
As a C++ programmer for decades, what you're doing is mildly horrifying because you're doing a lot of new allocation. That results in thrashing the heap allocation structures.
Also, the way you're doing this is pushing stuff which doesn't match to the end of the list so you are shuffling the contents down.
Hint - you should not need to create any new Nodes.
Yes, since how much space you have allocated at any single time doesn't depend on the arguments (e.g. the length of the list or how many values of x are in the list) the space complexity of the function is O(1)
The practical point of space complexity is to see how much memory your algorithm will require. You never require more than 1 node of memory (plus the local variables) and O(1) reflects that.
Measuring complexity in part depends on what you consider to be your variables. In terms of the number of nodes in the list, your algorithm is O(1) in space usage. However, this might not be the best perspective in this case.
Another variable in this situation is the size of a node. Often this aspect is ignored by complexity analysis, but I think it has value in this case. While your algorithm's space requirement does not depend on the number of nodes, it does depend on the size of a node. The more data in the node, the more space you need. Let s be the size of a single node; it would be fair to say that your algorithm's size requirement is O(s).
The size requirement of the more common algorithm for this task is O(1) even when accounting for both the number of nodes and the size of each node. (It has no need to create any nodes, no need to copy data.) I would not recommend your algorithm over that one.
To avoid being all negative, I would view your approach as two independent changes to the traditional one. One change is the introduction of the dummy node new_head. This change is useful (and in fact is in use), even though your implementation leaks memory. It is only marginally less efficient than not using a dummy head, and it simplifies the logic for removing nodes from the front of the list. This is good as long as your node size is not overly large.
The other change is the switch to copying nodes instead of moving them. This is the cringe-worthy change as it gratuitously adds work to the programmer, the compiler, and the execution. Asymptotic analysis (big-O) might not pick up on this addition, but it is there with no beneficial gains. You've trashed a key benefit of linked lists and gotten nothing in return.
Let's look at dropping the second change. You would need to add one line, specifically initializing new_head->next to head, but this is balanced out by removing the need to set tail->next to nullptr at the end. Another addition is an else clause so that the lines currently run every iteration are not necessarily run every iteration. Beyond that are code removal and some name changes: drop the temp pointer (use tail->next instead) and drop the creation of new nodes in the loop. Taken together, these changes strictly reduce the work being done (and the memory needs) compared to your code.
To address the memory leak, I've used a local dummy node instead of dynamically allocating it. That removes the last use of new, which in turn removes most of the objections raised in the question's comments.
Node* deleteAllOccurances(Node *head, int x)
{
Node new_head{-1}; //<-- Avoid dynamic allocation
new_head.next = head; //<-- added line
Node *tail = &new_head;
while(tail->next != nullptr) {
if(tail->next->data != x) {
tail = tail->next;
}
else { //<-- make the rest of the loop conditional
Node *q = tail->next;
tail->next = tail->next->next;
delete q;
}
}
return new_head.next;
}
This version removes the "cringe factor" as there is a benefit to the one node being created, and new is not being used. This version is clean enough to subject to complexity analysis without everyone asking "why???".

merge two lists in one , O(1) time complexity

i would like to ask how is it possible to merge two unsorted lists into one unsorted list in constant time in C since we need a while loop to get all the elements.
ex:
List1: 2 5 1 4 3
List2: 5 9 4 2 5 7 8
List3: elements of the two lists,don't care about order
Don't judge me i'm a beginner.
This depends on the data structure in memory and whether you can modify the existing lists.
If you can modify the existing lists, then you can ask List1 for it's last element (which is O(1) is the list header has a pointer to the end of the list) and then it's simply a matter of List1->last->next = List2->head. Afterwards, iterating over List1 will iterate over all elements.
If you must not change List1, then you have to copy the list; this is tricky to do with O(1) but still possible if you keep all elements in a single memory area (i.e. you don't use pointers to nodes; instead you keep all nodes in an array). In this case, you allocate memory for the nodes of both lists and then you can populate the result with two times memcopy(). Granted, memcopy() isn't really O(1) but with current CPUs (which can copy gigabytes per second), you usually don't notice the difference.
tl;dr: go and read about linked list data structures. All this stuff is perfectly common.
The minimal requirement for a genuinely O(1) append is that you have mutable linked lists with constant-time access to the tail.
So, the simplest possible linked list is:
struct ListNode {
struct ListNode *next;
int data; /* or void*, or whatever */
};
typedef struct ListNode *SinglyLinkedList;
ie, you just hold a pointer to the first element of your list. In this case, getting to the tail is linear time (O(n)), so you can't do what you want. However, if instead we use
struct ListHeadTail {
struct ListNode *head;
struct ListNode *tail;
/* could keep length here as well, if you want it */
};
then inserting to the list is slightly harder, but you can easily do a constant-time append:
struct ListHeadTail append(struct ListHeadTail *first,
struct ListHeadTail *second) {
struct ListHeadTail result;
/* special cases first, where either first or second is empty */
if (first->head == NULL) {
result = *second;
second->head = second->tail = NULL;
} else if (second->head == NULL) {
result = *first;
first->head = first->tail = NULL;
} else {
result.head = first->head;
result.tail = second->tail;
first->tail->next = second->head;
first->head = first->tail = NULL;
second->head = second->tail = NULL;
}
return result;
}
Other common structures are doubly-linked lists - again with a sentinel node rather than a having head->prev == tail->next == NULL.
If what you desire is for something to behave as a concatenated list, you can indeed do so by creating a ConcatenatedList class that is constructed from two existing List classes. If you are working with pure C as opposed to C++, the notion of a class may need to be fudged a bit, but structurally, the idea is the same.
The ConcatenatedList class can have two attributes: a header attribute, which is nothing more than a reference to the first List, and a tail attribute, which is nothing more than a reference to the second List.
You then mimic the methods of the basic List class with ConcatenatedList. To access the kth element of the ConcatenatedList, you try something like this:
if (k < header->size()) {
return header->getElement(k);
}
return tail->getElement(k - header->size());
The rest of the coding should be straight-forward from there.
Using this approach, the concatenation process will be O(1).
It is worth noting that the answer presented by (not) Useless is also valid, but it assumes a LinkedList structure to your lists, whereas this approach does not have that requirement. It is possible however, that after repeated concatenations, this approach will begin to look like a LinkedList implementation in terms of performance.

How can I use binary heap in the Dijkstra algorithm?

I am writing code of dijkstra algorithm, for the part where we are supposed to find the node with minimum distance from the currently being used node, I am using a array over there and traversing it fully to figure out the node.
This part can be replaced by binary heap and we can figure out the node in O(1) time, but We also update the distance of the node in further iterations, How will I incorporate that heap?
In case of array, all I have to do is go to the (ith -1) index and update the value of that node, but same thing can't be done in Binary heap, I will have to do the full search to figure out the position of the node and then update it.
What is workaround of this problem?
This is just some information I found while doing this in a class, that I shared with my classmates. I thought I'd make it easier for folks to find it, and I had left this post up so that I could answer it when I found a solution.
Note: I'm assuming for this example that your graph's vertices have an ID to keep track of which is which. This could be a name, a number, whatever, just make sure you change the type in the struct below.
If you have no such means of distinction, then you can use pointers to the vertices and compare their pointed-to addresses.
The problem you are faced with here is the fact that, in Dijkstra's algorithm, we are asked to store the graphs vertices and their keys in this priority queue, then update the keys of the ones left in the queue.
But... Heap data-structures have no way of getting at any particular node that is not the minimum or the last node!
The best we'd be able to do is traverse the heap in O(n) time to find it, then update its key and bubble-it-up, at O(Logn). That makes updating all vertices O(n) for every single edge, making our implementation of Dijkstra O(mn), way worse than the optimal O(mLogn).
Bleh! There has to be a better way!
So, what we need to implement isn't exactly a standard min-heap-based priority queue. We need one more operation than the standard 4 pq operations:
IsEmpty
Add
PopMin
PeekMin
and DecreaseKey
In order to DecreaseKey, we need to:
find a particular vertex inside the Heap
lower its key-value
"heap-up" or "bubble-up" the vertex
Essentially, since you were (I'm assuming it has been implemented sometime in the past 4 months) probably going to use an "array-based" heap implementation,
this means that we need the heap to keep track of each vertex and its index in the array in order for this operation to be possible.
Devising a struct like: (c++)
struct VertLocInHeap
{
int vertex_id;
int index_in_heap;
};
would allow you to keep track of it, but storing those in an array would still give you O(n) time for finding the vertex in the heap. No complexity improvement, and it's more complicated than before. >.<
My suggestion (if optimization is the goal here):
Store this info in a Binary Search Tree whose key value is the `vertex_id`
do a binary-search to find the vertex's location in the Heap in O(Logn)
use the index to access the vertex and update its key in O(1)
bubble-up the vertex in O(Logn)
I actually used a std::map declared as:
std::map m_locations;
in the heap instead of using the struct. The first parameter (Key) is the vertex_id, and the second parameter (Value) is the index in the heap's array.
Since std::map guarantees O(Logn) searches, this works nicely out-of-the-box. Then whenever you insert or bubble, you just m_locations[vertexID] = newLocationInHeap;
Easy money.
Analysis:
Upside: we now have O(Logn) for finding any given vertex in the p-q. For the bubble-up we do O(Log(n)) movements, for each swap doing a O(Log(n)) search in the map of array indexes, resulting in a O(Log^2(n) operation for bubble-up.
So, we have a Log(n) + Log^2(n) = O(Log^2(n)) operation for updating the key values in the Heap for a single edge. That makes our Dijkstra alg take O(mLog^2(n)). That's pretty close to the theoretical optimum, at least as close as I can get it. Awesome Possum!
Downside: We are storing literally twice as much information in-memory for the heap. Is it a "modern" problem? Not really; my desky can store over 8 billion integers, and many modern computers come with at least 8GB of RAM; however, it is still a factor. If you did this implementation with a graph of 4 billion vertices, which can happen a lot more often than you'd think, then it causes a problem. Also, all those extra reads/writes, which may not affect the complexity in analysis, may still take time on some machines, especially if the information is being stored externally.
I hope this helps someone in the future, because I had a devil of a time finding all this information, then piecing the bits I got from here, there, and everywhere together to form this. I'm blaming the internet and lack of sleep.
The problem I ran into with using any form of heap is that, you need to reorder the nodes in the heap. In order to do that, you would have to keep popping everything from the heap until you found the node you need, then change the weight, and push it back in (along with everything else you popped). Honestly, just using an array would probably be more efficient and easier to code than that.
The way I got around this was I used a Red-Black tree (in C++ it's just the set<> data type of the STL). The data structure contained a pair<> element which had a double (cost) and string (node). Because of the tree structure, it is very efficient to access the minimum element (I believe C++ makes it even more efficient by maintaining a pointer to the minimum element).
Along with the tree, I also kept an array of doubles that contained the distance for a given node. So, when I needed to reorder a node in the tree, I simply used the old distance from the dist array along with the node name to find it in the set. I would then remove that element from the tree and re-insert it into the tree with the new distance. To search for a node O(log n) and to insert a node O(log n), so the cost to reorder a node is O(2 * log n) = O(log n). For a binary heap, it also has a O(log n) for both insert and delete (and doesn't support search). So with the cost of deleting all of the nodes until you find the node you want, change its weight, then insert all nodes back in. Once the node has been reordered, I would then change the distance in the array to reflect the new distance.
I honestly can't think of a way to modify a heap in such a way to allow it to dynamically change the weights of a node, because the whole structure of the heap is based on the weights the nodes maintain.
I would do this using a hash table in addition to the Min-Heap array.
The hash table has keys that are hash coded to be the node objects and values that are the indices of where those nodes are in the min-heap arrray.
Then anytime you move something in the min-heap you just need to update the hash table accordingly. Since at most 2 elements will be moved per operation in the min-heap (that is they are exchanged), and our cost per move is O(1) to update the hash table, then we will not have damaged the asymptotic bound of the min-heap operations. For example, minHeapify is O(lgn). We just added 2 O(1) hash table operations per minHeapify operation. Therefore the overall complexity is still O(lgn).
Keep in mind you would need to modify any method that moves your nodes in the min-heap to do this tracking! For example, minHeapify() requires a modification that looks like this using Java:
Nodes[] nodes;
Map<Node, int> indexMap = new HashMap<>();
private minHeapify(Node[] nodes,int i) {
int smallest;
l = 2*i; // left child index
r = 2*i + 1; // right child index
if(l <= heapSize && nodes[l].getTime() < nodes[i].getTime()) {
smallest = l;
}
else {
smallest = i;
}
if(r <= heapSize && nodes[r].getTime() < nodes[smallest].getTime()) {
smallest = r;
}
if(smallest != i) {
temp = nodes[smallest];
nodes[smallest] = nodes[i];
nodes[i] = temp;
indexMap.put(nodes[smallest],i); // Added index tracking in O(1)
indexMap.put(nodes[i], smallest); // Added index tracking in O(1)
minHeapify(nodes,smallest);
}
}
buildMinHeap, heapExtract should be dependent on minHeapify, so that one is mostly fixed, but you do need the extracted key to be removed from the hash table as well. You'd also need to modify decreaseKey to track these changes as well. Once that's fixed then insert should also be fixed since it should be using the decreaseKey method. That should cover all your bases and you will not have altered the asymptotic bounds of your algorithm and you still get to keep using a heap for your priority queue.
Note that a Fibonacci Min Heap is actually preferred to a standard Min Heap in this implementation, but that's a totally different can of worms.
Another solution is "lazy deletion". Instead of decrease key operation you simply insert the node once again to heap with new priority. So, in the heap there will be another copy of node. But, that node will be higher in the heap than any previous copy. Then when getting next minimum node you can simply check if node is already being accepted. If it is, then simply omit the loop and continue (lazy deletion).
This has a little worse performance/higher memory usage due to copies inside the heap. But, it is still limited (to number of connections) and may be faster than other implementations for some problem sizes.
This algorithm: http://algs4.cs.princeton.edu/44sp/DijkstraSP.java.html works around this problem by using "indexed heap": http://algs4.cs.princeton.edu/24pq/IndexMinPQ.java.html which essentially maintains the list of mappings from key to array index.
I believe the main difficulty is being able to achieve O(log n) time complexity when we have to update vertex distance. Here are the steps on how you could do that:
For heap implementation, you could use an array.
For indexing, use a Hash Map, with Vertex number as the key and its index in heap as the value.
When we want to update a vertex, search its index in the Hash Map in O(1) time.
Reduce the vertex distance in heap and then keep traversing up (Check its new distance against its root, if root's value is greater swap root and current vertex). This step would also take O(log n).
Update the vertex's index in Hash Map as you make changes while traversing up the heap.
I think this should work and the overall time complexity would be O((E+V)*log V), just as the theory implies.
I am using the following approach. Whenever I insert something into the heap I pass a pointer to an integer (this memory location is ownned by me, not the heap) which should contain the position of the element in the array managed by the heap. So if the sequence of elements in the heap is rearranged it is supposed to update the values pointed to by these pointers.
So for the Dijkstra algirithm I am creating a posInHeap array of sizeN.
Hopefully, the code will make it more clear.
template <typename T, class Comparison = std::less<T>> class cTrackingHeap
{
public:
cTrackingHeap(Comparison c) : m_c(c), m_v() {}
cTrackingHeap(const cTrackingHeap&) = delete;
cTrackingHeap& operator=(const cTrackingHeap&) = delete;
void DecreaseVal(size_t pos, const T& newValue)
{
m_v[pos].first = newValue;
while (pos > 0)
{
size_t iPar = (pos - 1) / 2;
if (newValue < m_v[iPar].first)
{
swap(m_v[pos], m_v[iPar]);
*m_v[pos].second = pos;
*m_v[iPar].second = iPar;
pos = iPar;
}
else
break;
}
}
void Delete(size_t pos)
{
*(m_v[pos].second) = numeric_limits<size_t>::max();// indicate that the element is no longer in the heap
m_v[pos] = m_v.back();
m_v.resize(m_v.size() - 1);
if (pos == m_v.size())
return;
*(m_v[pos].second) = pos;
bool makingProgress = true;
while (makingProgress)
{
makingProgress = false;
size_t exchangeWith = pos;
if (2 * pos + 1 < m_v.size() && m_c(m_v[2 * pos + 1].first, m_v[pos].first))
exchangeWith = 2 * pos + 1;
if (2 * pos + 2 < m_v.size() && m_c(m_v[2 * pos + 2].first, m_v[exchangeWith].first))
exchangeWith = 2 * pos + 2;
if (pos > 0 && m_c(m_v[pos].first, m_v[(pos - 1) / 2].first))
exchangeWith = (pos - 1) / 2;
if (exchangeWith != pos)
{
makingProgress = true;
swap(m_v[pos], m_v[exchangeWith]);
*m_v[pos].second = pos;
*m_v[exchangeWith].second = exchangeWith;
pos = exchangeWith;
}
}
}
void Insert(const T& value, size_t* posTracker)
{
m_v.push_back(make_pair(value, posTracker));
*posTracker = m_v.size() - 1;
size_t pos = m_v.size() - 1;
bool makingProgress = true;
while (makingProgress)
{
makingProgress = false;
if (pos > 0 && m_c(m_v[pos].first, m_v[(pos - 1) / 2].first))
{
makingProgress = true;
swap(m_v[pos], m_v[(pos - 1) / 2]);
*m_v[pos].second = pos;
*m_v[(pos - 1) / 2].second = (pos - 1) / 2;
pos = (pos - 1) / 2;
}
}
}
const T& GetMin() const
{
return m_v[0].first;
}
const T& Get(size_t i) const
{
return m_v[i].first;
}
size_t GetSize() const
{
return m_v.size();
}
private:
Comparison m_c;
vector< pair<T, size_t*> > m_v;
};

Hashing to Calculate Frequencies can be improved?

I'm currently working on building a hash table in order to calculate the frequencies, depending on the running time of the data structure. O(1) insertion, O(n) worse look up time etc.
I've asked a few people the difference between std::map and the hash table and I've received an answer as;
"std::map adds the element as a binary tree thus causes O(log n) where with the hash table you implement it will be O(n)."
Thus I've decided to implement a hash table using the array of linked lists (for separate chaining) structure. In the code below I've assigned two values for the node, one being the key(the word) and the other being the value(frequency). It works as; when the first node is added if the index is empty it is directly inserted as the first element of linked list with the frequency of 0. If it is already in the list (which unfortunately takes O(n) time to search) increment its frequency by 1. If not found simply add it to the beginning of the list.
I know there are a lot of flows in the implementation thus I would like to ask the experienced people in here, in order to calculate frequencies efficiently, how can this implementation be improved?
Code I've written so far;
#include <iostream>
#include <stdio.h>
using namespace std;
struct Node {
string word;
int frequency;
Node *next;
};
class linkedList
{
private:
friend class hashTable;
Node *firstPtr;
Node *lastPtr;
int size;
public:
linkedList()
{
firstPtr=lastPtr=NULL;
size=0;
}
void insert(string word,int frequency)
{
Node* newNode=new Node;
newNode->word=word;
newNode->frequency=frequency;
if(firstPtr==NULL)
firstPtr=lastPtr=newNode;
else {
newNode->next=firstPtr;
firstPtr=newNode;
}
size++;
}
int sizeOfList()
{
return size;
}
void print()
{
if(firstPtr!=NULL)
{
Node *temp=firstPtr;
while(temp!=NULL)
{
cout<<temp->word<<" "<<temp->frequency<<endl;
temp=temp->next;
}
}
else
printf("%s","List is empty");
}
};
class hashTable
{
private:
linkedList* arr;
int index,sizeOfTable;
public:
hashTable(int size) //Forced initalizer
{
sizeOfTable=size;
arr=new linkedList[sizeOfTable];
}
int hash(string key)
{
int hashVal=0;
for(int i=0;i<key.length();i++)
hashVal=37*hashVal+key[i];
hashVal=hashVal%sizeOfTable;
if(hashVal<0)
hashVal+=sizeOfTable;
return hashVal;
}
void insert(string key)
{
index=hash(key);
if(arr[index].sizeOfList()<1)
arr[index].insert(key, 0);
else {
//Search for the index throughout the linked list.
//If found, increment its value +1
//else if not found, add the node to the beginning
}
}
};
Do you care about the worst case? If no, use an std::unordered_map (it handles collisions and you don't want a multimap) or a trie/critbit tree (depending on the keys, it may be more compact than a hash, which may lead to better caching behavior). If yes, use an std::set or a trie.
If you want, e.g., online top-k statistics, keep a priority queue in addition to the dictionary. Each dictionary value contains the number of occurrences and whether the word belongs to the queue. The queue duplicates the top-k frequency/word pairs but keyed by frequency. Whenever you scan another word, check whether it's both (1) not already in the queue and (2) more frequent than the least element in the queue. If so, extract the least queue element and insert the one you just scanned.
You can implement your own data structures if you like, but the programmers who work on STL implementations tend to be pretty sharp. I would make sure that's where the bottleneck is first.
1- The complexity time for search in std::map and std::set is O(log(n)). And, the amortize time complexity for std::unordered_map and std::unordered_set is O(n). However, the constant time for hashing could be very large and for small numbers become more than log(n). I always consider this face.
2- if you want to use std::unordered_map, you need to make sure that std::hash is defined for you type. Otherwise you should define it.

doubly linked list implementation

Which one would be more efficient?
I want to keep a list of items but, it's required of me to sort list
by id,
by name
by course credits
by the user
Would it be best to add items in list by id and then sort by the others or just add items without order and sort in the order needed when ever needed by the user?
If you're really required to keep the list sorted -- as opposed to using other data structures to give sorted access to the list -- then you could simply make a list whose elements have different pointers for different sort criteria.
In other words, instead of keeping just previous and next pointers, have previousById, nextById, previousByName, previousByCredits and nextByCredits. Likewise, you would have three head and/or tail pointers, instead of just one.
Please note that this approach has the drawback of being inflexible when it comes to implementing additional sort criteria. I'm assuming that you're trying to solve a homework-type problem, which is why I tried to tailor the answer to what seem to be the homework requirements.
You can use three maps (or hashmaps):
One mapping the id to the item, one mapping name to an item reference (or pointer) and one mapping course credits to item reference again.
It would be more efficient to sort it in whichever order that you know will be sorted for the most, for example if you know you're going to be retrieving by id most often, keep it sorted by id, otherwise pick one of the others though id would be the easiest if it is just an integer field
So then to do that you would check on insert to find where newid is less than nextid but greater than previousid, then allocate a new node with new and set the pointers appropriately.
Keeping the linked list sorted in some way is better than just keeping it unsorted. You're adding some time to how long it takes to insert an item but it's negligible to how long it would take to sort it that particular way
The more efficient would be to store the nodes as is, and keep 4 different indexes up-to-date. This way, when one order is required, you just pick up the right index and that's all. The cost is O(log N) for input, and O(1) for traversal.
Of course, keeping 4 indexes at once, with perhaps different requirements on uniqueness, and in the face of possible exceptions, is relatively difficult, but then, there's a Boost library for this: Boost MultiIndex
On example is to generate a set that can be sorted either by ID or by Name.
Since you can add as many indexes as you wish, it should get you going :)
Keep your lined list objects in the lined list, in random order. To sort the list by any key, use this pseudocode:
struct LinkedList {
string name;
LinkedList *prev;
LinkedList *next;
};
void FillArray(LinkedList *first, LinkedList **output, size_t &size) {
//function creates an array of pointers to every LinkedList object
LinedList *now;
size_t i; //you may use int instead of size_t
//check, how many objects are there in linked list
now=first;
while(now!=NULL) {
size++;
now=now->next;
}
//if linked list is empty
if (size==0) {
*output=NULL;
return;
}
//create the array;
*output = new LinkedList[size];
//fill the array
i=0;
now=first;
while(now!=NULL) {
*output[i++]=now;
now=now->next;
}
}
SortByName(LinkedList *arrayOfPointers, size_t size) {
// your function to sort by name here
}
void TemporatorySort(LinkedList *first, LinkedList **output, size_t &size) {
// this function will create the array of pointer to your linked list,
// sort this array, and return the sorted array. However, the linked
// list will stay as it is. It's good for example when your lined list
// is sorted by ID, but you need to print it sorted by names only once.
FillArray(first, *output, size);
SortByName(output,size);
}
void PermanentSort(LinkedList *first) {
// This function will sort the linked list and save the new order
// permanently.
LinkedList *sorted;
size_t size;
TemporatorySort(first,&sorted,size);
if (size>0) {
sorted[0].prev=NULL;
}
for(int i=1;i<size;i++) {
sorted[i-1].next=sorted[i];
sorted[i].prev=sorted[i-1];
}
sorted[size-1].next=NULL;
}
I hope, I actually did help you. If you don't understand any line from the code, simply put a comment to this "answer".