Write the most efficient implementation of the function RemoveDuplication()

Write the most efficient implementation of the function RemoveDuplication() - c++

so i am studying and i have this question , Write the most efficient implementation of the function RemoveDuplication(), which removes any duplication in the list. Assume that the list is sorted but may have duplication. So, if the list is originally  <2, 2, 5, 6, 6, 6, 9>, your function should make it <2, 5, 6, 9>.
the code that i thought of to remove the duplication is this right here , i wanted to know , if there is more efficient ways of removing the duplications in the list
template <class T>
void DLList<T>:: RemoveDuplication()
{
for(DLLNode<T>*ptr = head; ptr!=NULL; ptr=ptr->next)
while (ptr->val == ptr->next->val)
{
ptr->next->next->prev = ptr;
ptr->next = ptr->next->next;
}
}

It looks like your code will run in O(n) which is good for an algorithm. It is probably not going to be any more efficient, because you'll have to visit every item to delete it.
If you don't want to delete duplicate objects though, but want to return an new list containing the non-duplicate objects, you could make it slightly faster by making it O(m) where m is the amount of unique numbers, which is smaller or equal to n. But i couldn't think up any way to do this.
Recapping, it is possible to be slightly faster, but this is hard and the improvement is negligible.
ps. Dont forget to delete stuff when you take it out of your list ;)

I think O(n) is ok.
However the most important thing is your program will crash :-)
for(DLLNode<T>*ptr = head; ptr!=NULL; ptr=ptr->next)
while (ptr->val == ptr->next->val)
The code is deferencing ptr->next without checking that it is != NULL.
Hence, the algorithm should crash when it reaches the last element of the list.
And now an optimization question for you: how to make the program correct without testing for ptr AND ptr->next at each iteration ?

Related

Difference between iterating to next node in recursive function vs recursive function call

In this piece of code, I am comparing two linked lists and checking whether the items in the linked lists are equal or not.
bool Check(Node *listA, Node *listB) {
if (listA == NULL && listB == NULL)
return true;
if (listA->item == listB->item)
{
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
}
return false;
}
I was wondering what the difference between this code:
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
and this is:
return Check(listA->link, listB->link);
Both pieces of code produce the correct answer, but I can't seem to understand what the difference is.

There is no difference, they do exactly the same thing. The only difference is that you could change something in the next node if you needed before calling Check(). But in your case they are exactly the same, the second option is cleaner tho so i recommend that one.

In general, modifying an IN parameter value makes your function's code and intent less clear, so it's best to avoid it.
Also consider that if you are using a debugger and step back to a prior recursive call, you will not be able to see the correct node that was inspected, since its pointer was already overwritten. Thus it will be more confusing to debug.
Practically speaking, the outcome of both functions will be the same. The second one may be infinitesimally faster due to skipping the two pointless assignment operations, unless that is optimized away.

This is strange question of Space Complexity. Can someone provide any insights?

I was solving this question when this approach clicked in -
Given a single linked list and an integer x. Your task is to complete the function deleteAllOccurances() which deletes all occurrences of a key x present in the linked list. The function takes two arguments: the head of the linked list and an integer x. The function should return the head of the modified linked list.
I am not sure what is the space complexity of my code.
I think since I am only using 1 extra Node space and simultaneously creating new nodes and deleting old ones, so it should be O(1).
Node* deleteAllOccurances(Node *head,int x)
{
Node *new_head = new Node(-1);
Node *tail = new_head;
Node *temp = head;
Node *q;
while(temp != NULL) {
if(temp->data != x) {
tail->next = new Node(temp->data);
tail = tail->next;
}
q = temp;
delete q;
temp = temp->next;
}
tail->next = NULL;
return new_head->next;
}

Well, kind of.
It depends on whether you are considering total allocations as a net change (in which case you're right).
But if you are thinking about the amount of times you hit the heap for new allocations, then it's using more space and a ton of computation. (A given C++ compiler and runtime is not obliged to guarantee immediately reusing space freed in the heap, just that it's available for reuse.)
As a C++ programmer for decades, what you're doing is mildly horrifying because you're doing a lot of new allocation. That results in thrashing the heap allocation structures.
Also, the way you're doing this is pushing stuff which doesn't match to the end of the list so you are shuffling the contents down.
Hint - you should not need to create any new Nodes.

Yes, since how much space you have allocated at any single time doesn't depend on the arguments (e.g. the length of the list or how many values of x are in the list) the space complexity of the function is O(1)
The practical point of space complexity is to see how much memory your algorithm will require. You never require more than 1 node of memory (plus the local variables) and O(1) reflects that.

Measuring complexity in part depends on what you consider to be your variables. In terms of the number of nodes in the list, your algorithm is O(1) in space usage. However, this might not be the best perspective in this case.
Another variable in this situation is the size of a node. Often this aspect is ignored by complexity analysis, but I think it has value in this case. While your algorithm's space requirement does not depend on the number of nodes, it does depend on the size of a node. The more data in the node, the more space you need. Let s be the size of a single node; it would be fair to say that your algorithm's size requirement is O(s).
The size requirement of the more common algorithm for this task is O(1) even when accounting for both the number of nodes and the size of each node. (It has no need to create any nodes, no need to copy data.) I would not recommend your algorithm over that one.
To avoid being all negative, I would view your approach as two independent changes to the traditional one. One change is the introduction of the dummy node new_head. This change is useful (and in fact is in use), even though your implementation leaks memory. It is only marginally less efficient than not using a dummy head, and it simplifies the logic for removing nodes from the front of the list. This is good as long as your node size is not overly large.
The other change is the switch to copying nodes instead of moving them. This is the cringe-worthy change as it gratuitously adds work to the programmer, the compiler, and the execution. Asymptotic analysis (big-O) might not pick up on this addition, but it is there with no beneficial gains. You've trashed a key benefit of linked lists and gotten nothing in return.
Let's look at dropping the second change. You would need to add one line, specifically initializing new_head->next to head, but this is balanced out by removing the need to set tail->next to nullptr at the end. Another addition is an else clause so that the lines currently run every iteration are not necessarily run every iteration. Beyond that are code removal and some name changes: drop the temp pointer (use tail->next instead) and drop the creation of new nodes in the loop. Taken together, these changes strictly reduce the work being done (and the memory needs) compared to your code.
To address the memory leak, I've used a local dummy node instead of dynamically allocating it. That removes the last use of new, which in turn removes most of the objections raised in the question's comments.
Node* deleteAllOccurances(Node *head, int x)
{
Node new_head{-1}; //<-- Avoid dynamic allocation
new_head.next = head; //<-- added line
Node *tail = &new_head;
while(tail->next != nullptr) {
if(tail->next->data != x) {
tail = tail->next;
}
else { //<-- make the rest of the loop conditional
Node *q = tail->next;
tail->next = tail->next->next;
delete q;
}
}
return new_head.next;
}
This version removes the "cringe factor" as there is a benefit to the one node being created, and new is not being used. This version is clean enough to subject to complexity analysis without everyone asking "why???".

Generalised suffix tree traversal to find longest common substring

I'm working with suffix trees. As far as I can tell, I have Ukkonen's algorithm running correctly to build a generalised suffix tree from an arbitrary number of strings. I'm now trying to implement a find_longest_common_substring() method to do exactly that. For this to work, I understand that I need to find the deepest shared edge (with depth in terms of characters, rather than edges) between all strings in the tree, and I've been struggling for a few days to get the traversal right.
Right now I have the following in C++. I'll spare you all my code, but for context, I'm keeping the edges of each node in an unordered_map called outgoing_edges, and each edge has a vector of ints recorded_strings containing integers identifying the added strings. The child field of an edge is the node it is going to, and l and r identify its left and rightmost indices, respectively. Finally, current_string_number is the current number of strings in the tree.
SuffixTree::Edge * SuffixTree::find_deepest_shared_edge(SuffixTree::Node * start, int current_length, int &longest) {
Edge * deepest_shared_edge = new Edge;
auto it = start->outgoing_edges.begin();
while (it != start->outgoing_edges.end()) {
if (it->second->recorded_strings.size() == current_string_number + 1) {
int edge_length = it->second->r - it->second->l + 1;
int path_length = current_length + edge_length;
find_deepest_shared_edge(it->second->child, path_length, longest);
if (path_length > longest) {
longest = path_length;
deepest_shared_edge = it->second;
}
}
it++;
}
return deepest_shared_edge;
}
When trying to debug, as best I can tell, the traversal runs mostly fine, and correctly records the path length and sets longest. However, for reasons I don't quite understand, in the innermost conditional, deepest_shared_edge sometimes seems to get updated to a mistaken edge. I suspect I maybe don't quite understand how it->second is updated throughout the recursion. Yet I'm not quite sure how to go about fixing this.
I'm aware of this similar question, but the approach seems sufficiently different that I'm not quite sure how it applies here.
I'm mainly during this for fun and learning, so I don't necessarily need working code to replace the above - pseudocode or just any explanation of where I'm confused would be just as well.

Your handling of deepest_shared_edge is wrong. First, the allocation you do at the start of the function is a memory leak, since you never free the memory. Secondly, the result of the recursive call is ignored, so whatever deepest edge it finds is lost (although you update the depth, you don't keep track of the deepest edge).
To fix this, you should either pass deepest_shared_edge as a reference parameter (like you do for longest), or you can initialize it to nullptr, then check the return from your recursive call for nullptr and update it appropriately.

Efficiently filling in first gap in an ordered list of numbers

I'm writing something that needs to start with a list of numbers, already in order but possibly with gaps, and find the first gap, fill in a number in that gap, and return the number it filled in. The numbers are integers on the range [0, inf). I have this, and it works perfectly:
list<int> TestList = {0, 1, 5, 6, 7};
int NewElement;
if(TestList.size() == 0)
{
NewElement = 0;
TestList.push_back(NewElement);
}
else
{
bool Selected = false;
int Previous = 0;
for(auto Current = TestList.begin(); Current != TestList.end(); Current++)
{
if(*Current > Previous + 1)
{
NewElement = Previous + 1;
TestList.insert(Current, NewElement);
Selected = true;
break;
}
Previous = *Current;
}
if(!Selected)
{
NewElement = Previous + 1;
TestList.insert(TestList.end(), NewElement);
}
}
But I'm worried about efficiency, since I'm using an equivalent piece of code to allocate uniform block binding locations in OpenGL behind a class wrapper I wrote (but that isn't exactly relevant to the question :) ). Any suggestions to improve the efficiency? I'm not even sure if std::list is the best choice for it.

Some suggestions:
Try other containers and compare. A linked list may have good theoretical properties, but the real-life benefits of contiguous storage, such as in a sorted vector, can be dramatic.
Since your range is already ordered, you can perform binary search to find the gap: Start in the middle, and if the value there is equal to half the size, there's no gap, so you can restrict searching to the other half. Rinse and repeat. (This assumes that there are no repeated numbers in the range, which I suppose is a reasonable restriction given that you have a notion of "gap".)
This is more of a theoretical, separate suggestion. A binary search on a pure linked list cannot be implemented very efficiently, so a different sort of data structure would be needed to take advantage of this approach.

Some suggestions:
A cursor gap structure (a.k.a. “gap buffer”) based on std::vector is probably the fastest (regardless of system) for this. For the implementation, set the capacity at the outset (known from the largest number) so as to avoid costly dynamic allocations.
Binary search can be employed on a cursor gap structure, with then fast block move to move the insertion point, but do measure if you go this route: often methods that are bad for a very large number of items, such as linear search, turn out to be best for a small number of items.
Retain the position between calls, so you don't have to start from the beginning each time, reducing the total algorithmic complexity for filling in all, from O(n2) to O(n).
Update: based on the OP’s additional commentary information that this is a case of allocating and deallocating numbers, where the order (apparently) doesn't matter, the fastest is probably to use a free list, as suggested in a comment by j_random_hacker. Simply put, at the outset push all available numbers onto a stack, e.g. a std::stack, the free-list. To allocate a number simply pop it off the stack (choosing the number at top of the stack), to deallocate a number simply push it.

Insertion Sort to Sort Nodes in a LinkedList

Im trying to use the insertion sort method in order to sort nodes from a LinkedList. I've adjusted the code so many times but I can't quite seem to get it, keep getting different types of results none which are sorted.
Heres the code:
Node* sort_list(Node* head)
{
Node* node_ptr = NULL;
for(Node* i = head->next; i->next != NULL; i = i->next){
if (i->key < head->key) {
node_ptr = i;
head = head->next;
}
}
return node_ptr;
}

This is a homework problem so instead of outright writing a code, I will first point out where you went wrong.
In an insertion sort like algorithm, obviously there needs to be some kind of swapping that needs to be done between elements that are out of place (that is need to be inserted). Hence start with thinking about how you can swap two elements of the array. Pay special attention to the cases when one is head or one is tail.
Your implemented code doesn't have any trace of pointer swaps so this is where you are wrong.
Next you must think about the cases when we need to sort. In this case, it is rather simple. If the current element and the next are in sorted order (assuming ascending order, current < next). Then nothing needs to be done but simply make the next one the current.
Then you can obviously infer that violation of this case is when you need to swap the elements. After the swap (with proper attention to where the pointers were and will be after sorting), repeat the process till you hit the null wall.
P.S : This is a possible duplicate of another SO question.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Write the most efficient implementation of the function RemoveDuplication() - c++

Related

Difference between iterating to next node in recursive function vs recursive function call

This is strange question of Space Complexity. Can someone provide any insights?

Generalised suffix tree traversal to find longest common substring

Efficiently filling in first gap in an ordered list of numbers

Insertion Sort to Sort Nodes in a LinkedList

Categories

Resources