This is strange question of Space Complexity. Can someone provide any insights? - c++

I was solving this question when this approach clicked in -
Given a single linked list and an integer x. Your task is to complete the function deleteAllOccurances() which deletes all occurrences of a key x present in the linked list. The function takes two arguments: the head of the linked list and an integer x. The function should return the head of the modified linked list.
I am not sure what is the space complexity of my code.
I think since I am only using 1 extra Node space and simultaneously creating new nodes and deleting old ones, so it should be O(1).
Node* deleteAllOccurances(Node *head,int x)
{
Node *new_head = new Node(-1);
Node *tail = new_head;
Node *temp = head;
Node *q;
while(temp != NULL) {
if(temp->data != x) {
tail->next = new Node(temp->data);
tail = tail->next;
}
q = temp;
delete q;
temp = temp->next;
}
tail->next = NULL;
return new_head->next;
}

Well, kind of.
It depends on whether you are considering total allocations as a net change (in which case you're right).
But if you are thinking about the amount of times you hit the heap for new allocations, then it's using more space and a ton of computation. (A given C++ compiler and runtime is not obliged to guarantee immediately reusing space freed in the heap, just that it's available for reuse.)
As a C++ programmer for decades, what you're doing is mildly horrifying because you're doing a lot of new allocation. That results in thrashing the heap allocation structures.
Also, the way you're doing this is pushing stuff which doesn't match to the end of the list so you are shuffling the contents down.
Hint - you should not need to create any new Nodes.

Yes, since how much space you have allocated at any single time doesn't depend on the arguments (e.g. the length of the list or how many values of x are in the list) the space complexity of the function is O(1)
The practical point of space complexity is to see how much memory your algorithm will require. You never require more than 1 node of memory (plus the local variables) and O(1) reflects that.

Measuring complexity in part depends on what you consider to be your variables. In terms of the number of nodes in the list, your algorithm is O(1) in space usage. However, this might not be the best perspective in this case.
Another variable in this situation is the size of a node. Often this aspect is ignored by complexity analysis, but I think it has value in this case. While your algorithm's space requirement does not depend on the number of nodes, it does depend on the size of a node. The more data in the node, the more space you need. Let s be the size of a single node; it would be fair to say that your algorithm's size requirement is O(s).
The size requirement of the more common algorithm for this task is O(1) even when accounting for both the number of nodes and the size of each node. (It has no need to create any nodes, no need to copy data.) I would not recommend your algorithm over that one.
To avoid being all negative, I would view your approach as two independent changes to the traditional one. One change is the introduction of the dummy node new_head. This change is useful (and in fact is in use), even though your implementation leaks memory. It is only marginally less efficient than not using a dummy head, and it simplifies the logic for removing nodes from the front of the list. This is good as long as your node size is not overly large.
The other change is the switch to copying nodes instead of moving them. This is the cringe-worthy change as it gratuitously adds work to the programmer, the compiler, and the execution. Asymptotic analysis (big-O) might not pick up on this addition, but it is there with no beneficial gains. You've trashed a key benefit of linked lists and gotten nothing in return.
Let's look at dropping the second change. You would need to add one line, specifically initializing new_head->next to head, but this is balanced out by removing the need to set tail->next to nullptr at the end. Another addition is an else clause so that the lines currently run every iteration are not necessarily run every iteration. Beyond that are code removal and some name changes: drop the temp pointer (use tail->next instead) and drop the creation of new nodes in the loop. Taken together, these changes strictly reduce the work being done (and the memory needs) compared to your code.
To address the memory leak, I've used a local dummy node instead of dynamically allocating it. That removes the last use of new, which in turn removes most of the objections raised in the question's comments.
Node* deleteAllOccurances(Node *head, int x)
{
Node new_head{-1}; //<-- Avoid dynamic allocation
new_head.next = head; //<-- added line
Node *tail = &new_head;
while(tail->next != nullptr) {
if(tail->next->data != x) {
tail = tail->next;
}
else { //<-- make the rest of the loop conditional
Node *q = tail->next;
tail->next = tail->next->next;
delete q;
}
}
return new_head.next;
}
This version removes the "cringe factor" as there is a benefit to the one node being created, and new is not being used. This version is clean enough to subject to complexity analysis without everyone asking "why???".

Related

Are the Performances of these two programs the same?

Say for example you have a linked list 1->2->3->4->5->6->NULL and you want to calculate the total of the even indices of that linked list (assuming that the 1st index starts with 1 and the size of the linked list is even)
First Approach
int total = 0;
int count = 0;
Node *ptr = head;
while(ptr != NULL)
{
if(count % 2 == 0)
{
total += ptr->data;
}
count++;
ptr = ptr->next;
}
Second Approach
int total = 0;
Node *ptr = head;
while(ptr != NULL)
{
total += ptr->data;
ptr = ptr->next->next;
}
So after I did these two approaches do they have the same performance?
I read your question again and will answer that probably the second method is slightly faster.
Now, the comments section immediately highlighted that it's also more dangerous. You have actually specified that the assumption is the number of nodes in the list is even. If that is a guaranteed and enforceable precondition, then it's technically okay to do this.
Even a smart optimizing compiler has no way of knowing about this precondition of even list-length, so the very best it could likely achieve is to recognize that count is only used for controlling whether total is updated and so the loop could be unrolled as follows:
// Possible automatic compiler optimization of First Approach
while (ptr)
{
total += ptr->data;
ptr = ptr->next;
// Skip over every second node
if (ptr) ptr = ptr->next;
}
In basic terms, what we now have is one more pointer test (branch) per loop iteration than your Second Approach has. This results in more instructions (specifically a branching instruction) and so the code will technically be (slightly) slower.
Of course, the actual impact of this is likely to be very small. Your main bottleneck is pointer indirection and fetches from memory, rather than the pointer test itself. If the memory used by each node is not mostly contiguous, you'll run into caching problems on large lists (which in practice affects performance by about a factor of 100).
What I mean to indicate by all the above, is that the benefits of your special optimization based on the precondition of even list-length suffers from diminishing returns.
Given that it is inherently unsafe unless very well-documented in the code and/or protected by a list "evenness" test (if you store the node count somewhere), I would recommend coding defensively by using your First Approach or use my equivalent and (arguably) tidier version of that.

separate chaining in hashing

I am reading about hashing in Robert Sedwick book on Algorithms in C++
We might be using a header node to streamline the code for insertion
into an ordered list, but we might not want to use M header nodes for
individual lists in separate chaining. Indeed, we could even eliminate
the M links to the lists by having the first nodes in the lists
comprise the table
.
class ST
{
struct node
{
Item item;
node* next;
node(Item x, node* t)
{ item = x; next = t; }
};
typedef node *link;
private:
link* heads;
int N, M;
Item searchR(link t, Key v)
{
if (t == 0) return nullItem;
if (t->item.key() == v) return t->item;
return searchR(t->next, v);
}
public:
ST(int maxN)
{
N = 0; M = maxN/5;
heads = new link[M];
for (int i = 0; i < M; i++) heads[i] = 0;
}
Item search(Key v)
{ return searchR(heads[hash(v, M)], v); }
void insert(Item item)
{ int i = hash(item.key(), M);
heads[i] = new node(item, heads[i]); N++; }
};
My two questions on above text what does author mean by
"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table." How can we modify above code for this?
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.
"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table."
Consider Node* x[n] vs Node x[n]: the former needs an extra pointer and on-insertion memory allocated for the head Node of every non-empty element, and an extra indirection for every hash table operation, while the latter eliminates the n pointers but requires that any unused elements will be able to be put in some discernable not-in-use state (tracking of which may or may not require extra memory), and if sizeof(Node) size is greater than sizeof(Node*), it may be more wasteful of memory anyway. The difference in memory use can also affect efficiency of cache use: if the table has a high element to buckets ratio then a Node[] gets the Node data into fewer contiguous memory pages, and if you're iterating (in unsorted order) then it's very cache efficient, whereas Node*[] will jump to separate memory allocations that might be all over the place (or on the other hand, might actually be quite close together in some actually useful: e.g. if both access patterns and dynamic memory allocation addresses correlate to chronological time of object creation.
How can we modify above code for this?
First, your existing code has a problem: heads[i] = new node(item, heads[i]); overwrites an entry in the hash table without first checking if it's empty... if there's anything there then you should be adding to the list, not overwriting the array.
The design change discussed needs:
link* heads;
...changed to...
node* head;
You'd initialise it like this:
head = new node[M];
Which needs an extra node constructor (if item has an equivalent default constructor, you can leave out its initialisation below)
node() : item(nullItem), next(nullptr) { }
Then there's some knock on changes to the rest of your code that are easy to work through. Basically, you're getting rid of a layer of pointers.
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.
I didn't write it so can't say authoritatively, but it appears to be saying that when designing the list code, a decision might have been made to have an initial Node even in an empty list, as this simplifies code for several list operations. While the extra data-less Node might seem a reasonable price when contemplating "usual" uses of a list, hash tables are unusual in that you want most of the lists chained of the buckets to have 0 or 1 element, and exponentially fewer should be longer and longer. So, such a list implementation is poorly suited to use in a hash table.

Insertion Sort to Sort Nodes in a LinkedList

Im trying to use the insertion sort method in order to sort nodes from a LinkedList. I've adjusted the code so many times but I can't quite seem to get it, keep getting different types of results none which are sorted.
Heres the code:
Node* sort_list(Node* head)
{
Node* node_ptr = NULL;
for(Node* i = head->next; i->next != NULL; i = i->next){
if (i->key < head->key) {
node_ptr = i;
head = head->next;
}
}
return node_ptr;
}
This is a homework problem so instead of outright writing a code, I will first point out where you went wrong.
In an insertion sort like algorithm, obviously there needs to be some kind of swapping that needs to be done between elements that are out of place (that is need to be inserted). Hence start with thinking about how you can swap two elements of the array. Pay special attention to the cases when one is head or one is tail.
Your implemented code doesn't have any trace of pointer swaps so this is where you are wrong.
Next you must think about the cases when we need to sort. In this case, it is rather simple. If the current element and the next are in sorted order (assuming ascending order, current < next). Then nothing needs to be done but simply make the next one the current.
Then you can obviously infer that violation of this case is when you need to swap the elements. After the swap (with proper attention to where the pointers were and will be after sorting), repeat the process till you hit the null wall.
P.S : This is a possible duplicate of another SO question.

Non iterative equivalent for reversing a linked list

I am reading about list traversals in Algorithms book by RobertSedwick. Function definitions are shown below. It is mentioned that it is possible to have traverse and remove functions can have iterative counter parts, but traverseR cannot have. My question why traverseR cannot have iterative counter part? Is it that if recursive call is not end of function i.e., like in traverse then we cannot have iterative, Is my understanding right?
Thanks for your time and help.
void traverse(link h, void visit(link))
{
if (h == 0) return;
visit(h);
traverse(h->next, visit);
}
void traverseR(link h, void visit(link))
{
if (h == 0) return;
traverseR(h->next, visit);
visit(h);
}
void remove(link& x, Item v)
{
while (x != 0 && x->item == v)
{ link t = x; x = x->next; delete t; }
if (x != 0) remove(x->next, v);
}
traverseR uses the call stack to store pointers to all the nodes of the list, so that they can be accessed in reverse order as the call stack unwinds.
In order to do this without a call stack (i.e. non-recursively), you'll need some other stack-like data structure to store these pointers in.
The other functions simply work on the current node and move on, with no need to store anything for use after the recursive function call returns. This means that the tail recursion can be replaced with a loop (either by modifying the code or, depending on the compiler, letting it determine that that's possible and make the transformation itself).
Assuming that the list is single-linked, it is not possible to visit it iteratively in the backward order because there's no pointer from a node to a previous node.
What the recursive implementation of traverseR essentially does is that it implicitly reverses the list and visits it in the forward order.
You could write and iterative version of traverseR using a stack: in a loop iterate from one node to another, pushing the nodes on the stack. When you get to the end of the list then, in another loop, pop and visit the nodes you visited.
But his is basically what the recursive version does.
It is possible to traverse a singly linked list in reverse order with only O(1) extra space -- i.e., without a stack of previously visited nodes. It is, however, a little tricky, and not at all thread safe.
The trick to this is to traverse the list from beginning to end, reversing it in place as you do so, then traverse it back to the beginning, reversing it again on the way back through.
Since it is a linked list, reversing it in place is fairly straightforward: as you get to a node, save the current value of its next pointer, and overwrite that with the address of the previous node in the list (see the code for more detail):
void traverseR(node *list, void (*visit)(node *)) {
node *prev = nullptr;
node *curr = list;
node *next;
if (!curr)
return;
// Traverse forwards, reversing list in-place as we go.
do {
next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
} while (curr->next);
// fix up so we have a fully reversed list
curr->next = prev;
prev = nullptr;
// Traverse the reversed list, visiting each node and reversing again
do {
visit(curr);
next = curr->next;
curr->next = prev;
prev = curr;
curr = next;
} while (curr->next);
}
Like almost anything dealing with linked lists, I feel obliged to add that (at least IMO) they should almost always be treated as a purely intellectual exercise. Using them in real code is usually a net loss. You typically end up with code that's slow, fragile, and hard to understand, as well as typically wasting quite a bit of memory (unless the data you store in each node is pretty big, the pointer can often use as much space as the data itself).
My question why traverseR cannot have iterative counter part? Is it that if recursive call is not end of function i.e., like in traverse then we cannot have iterative, Is my understanding right?
Correct. The functions traverse and remove end with a call to themselves. They are tail recursive functions. The call in traverseR to itself is not at the end of the function; traverseR is not tail recursive.
Recursion in general has an expense of creating and later destroying stack frames. This expense can be completely avoided with tail recursive functions by changing the recursion into iteration. Most compilers recognize tail recursive functions and convert the recursion to iteration.
It is possible to write an iterative version of traverseR depending on what you mean by iterative. If you are limited so a single traversal through the list, it is not possible. But if you can sacrifice a lot processing time it can be done. It does use less memory in the classic speed vs. memory trade-off.
void traverseRI(link h, void visit(link))
{
if (h == 0) return;
link last = 0;
while (last != h)
{
link test = h;
while (test->next != last)
{
test = test->next;
}
visit(test);
last = test;
}
}

Cyclical Linked List Algorithm

I have been asked recently in a job interview to develop an algorithm that can determine whether a linked list is cyclical. As it's a linked list, we don't know its size. It's a doubly-linked list with each node having 'next' and 'previous' pointers. A node can be connected to any other node or it can be connected to itself.
The only solution that I came up at that time was to pick a node and check it with all the nodes of the linked list. The interviewer obviously didn't like the idea as it is not an optimal solution. What would be a better approach?
What you are looking for is a cycle-finding algorithm. The algorithm Joel refers to is called either the 'tortoise and hare' algorithm or Floyd's cycle finding algorithm. I prefer the second because it sounds like it would make a good D&D spell.
Wikpedia overview of cycle finding algorithms, with sample code
The general solution is to have 2 pointers moving at different rates. They will eventually be equal if some portion of the list is circular. Something along the lines of this:
function boolean hasLoop(Node startNode){
Node slowNode = startNode;
Node fastNode1 = startNode;
Node fastNode2 = startNode;
while (slowNode && fastNode1 = fastNode2.next() && fastNode2 = fastNode1.next()){
if (slowNode == fastNode1 || slowNode == fastNode2)
return true;
slowNode = slowNode.next();
}
return false;
}
Blatantly stolen from here: http://ostermiller.org/find_loop_singly_linked_list.html
Keep a hash of pointer values. Every time you visit a node, hash its pointer and store it. If you ever visit one that already has been stored you know that your list is circular.
This is an O(n) algorithm if your hash table is constant.
Another option is that since the list is doubly linked, you can traverse the list and check if the next pointers previous is always the current node or null and not the head. The idea here is that a loop must either encompass the entire list or look something like this:
- -*- \
\ \
\---
At Node * there are 2 incoming links only one of which can be the previous.
Something like:
bool hasCycle(Node head){
if( head->next == head ) return true;
Node current = head -> next;
while( current != null && current->next != null ) {
if( current == head || current->next->prev != current )
return true;
current = current->next;
}
return false; // since I've reached the end there can't be a cycle.
}
You can handle a general complete circular list like this: Loop through the linked list via the first element until you reach the end of the list or until you get back to the first element.
But if you want to handle the case where a portion of the list is circular then you need to also move ahead your first pointer periodically.
Start with two pointers pointing at the same element. Walk one pointer through the list, following the next pointers. The other walks the list following the previous pointers. If the two pointers meet, then the list is circular. If you find an element with a previous or next pointer set to NULL, then you know the list is not circular.
[Edit the question and subject has been reworded to clarify that we're checking for cycles in a doubly linked list, not checking if a doubly linked list is merely circular, so parts of this post may be irrelevant.]
Its a doubly link list with each node
having 'next' and 'previous' pointers.
Doubly-linked lists are commonly implemented with the head and tail of the list pointing to NULL to indicate where they end.
[Edit] As pointed out, this only checks if the list is circular as a whole, not if it has cycles in it, but that was the wording of the original question.
If the list is circular, tail->next == head and/or head->prev == tail. If you don't have access to both the tail and head node and only have one of those but not both, then it should suffice to simply check if head->prev != NULL or tail->next != NULL.
If this isn't a sufficient answer because we're only given some random node [and looking for cycles anywhere in the list], then all you have to do is take this random node and keep traversing the list until you reach a node that matches (in which case it is circular) or a null pointer (in which case it's not).
However, this is essentially the same thing as the answer you already provided which the interviewer didn't like. I'm quite certain that without some magical hack, there is no way to detect a cycle in a linked list, provided a random node, without a linear complexity algorithm.
[Edit] My mind has switched gears now with the focus on detecting cycles in a list as opposed to determining if a linked list is circular.
If we have a case like:
1<->2<->3<->[2]
The only way I can see that we can detect cycles is to keep track of all the elements we traversed so far and look for any match along the way.
Of course this could be cheap. If we're allowed to modify the list nodes, we could keep a simply traversed flag with each node that we set as we're doing this. If we encounter a node with this flag already set, then we've found a cycle. However, this wouldn't work well for parallelism.
There is a solution proposed here [which I stole from another answer] called "Floyd's Cycle-Finding Algorithm". Let's take a look at it (modified to make it a little easier for me to read).
function boolean hasLoop(Node startNode)
{
Node fastNode2 = startNode;
Node fastNode1 = startNode;
Node slowNode = startNode;
while ( slowNode && (fastNode1 = fastNode2.next()) && (fastNode2 = fastNode1.next()) )
{
if (slowNode == fastNode1 || slowNode == fastNode2)
return true;
slowNode = slowNode.next();
}
return false;
}
It basically involves using 3 iterators instead of 1. We can look at a case like: 1->2->3->4->5->6->[2] case:
First we start at [1] with a fast iterator to [2] and another at [3] or [1, 2, 3]. We stop when the first iterator matches either of the two second iterators.
We proceed with [2, 4, 5] (the first fast iterator traverses the next node of the second fast iterator, and the second fast iterator traverses the next node of the first fast iterator after that). Then [3, 6, 2], and finally [4, 3, 4].
Yay, we've found a match, and have thus determined the list to contain a cycle in 4 iterations.
Assuming that someone says "Here a pointer to a member of a list. Is it a member of a circular list?" then you could examine all reachable members in one direction of the list for pointers to the one node that you were given a pointer to in their pointer which should point away from you. If you decide to go in the next direction then you look for next pointers that are equal to the pointer you were first given. If you choose to go in the prev direction then you look for prev pointers that equal the pointer that you were first given. If you reach a NULL pointer in either direction then you have found the end and know that it is not circular.
You could extend this by going in both directions at the same time and seeing if you bump into yourself, but it gets more complicated and it really doesn't save you anything. Even if you implemented this with 2 threads on a multi-core machine you'd be dealing with shared volatile memory comparisons, which would kill performance.
Alternately, if you can mark each node in the list you could try to determine if there was a cycle by looking for your mark while you searched for the end. If you found your mark in a node you would know that you had been there before. If you found an end before you found one of your marks you would know it wasn't circular. This would not work of another thread were trying to do this at the same time, though, because you would get your marks mixed up, but the other implementation wouldn't work if other threads were reordering the list at the same time as the test.
What you need is Floyd's cycle-finding algorithm. You can also think of finding the the intersection point of the cycle as homework.
Here is a clean approach to test if a linked list has cycles in it (if it's cyclical) based on Floyd's algorithm:
int HasCycle(Node* head)
{
Node *p1 = head;
Node *p2 = head;
while (p1 && p2) {
p1 = p1->next;
p2 = p2->next->next;
if (p1 == p2)
return 1;
}
return 0;
}
The idea is to use two pointers, both starting from head, that advance on different speeds. If they meet each other, that's our clue that there is a cycle in our list, if not, the list is cycle-less.
It is unbelievable how wide can complicated solutions spread.
Here's an absolute minimum required for finding whether a linked list is circular:
bool is_circular(node* head)
{
node* p = head;
while (p != nullptr) {
p = p->next;
if (p == head)
return true;
}
return false;
}