Are the Performances of these two programs the same? - c++

Say for example you have a linked list 1->2->3->4->5->6->NULL and you want to calculate the total of the even indices of that linked list (assuming that the 1st index starts with 1 and the size of the linked list is even)
First Approach
int total = 0;
int count = 0;
Node *ptr = head;
while(ptr != NULL)
{
if(count % 2 == 0)
{
total += ptr->data;
}
count++;
ptr = ptr->next;
}
Second Approach
int total = 0;
Node *ptr = head;
while(ptr != NULL)
{
total += ptr->data;
ptr = ptr->next->next;
}
So after I did these two approaches do they have the same performance?

I read your question again and will answer that probably the second method is slightly faster.
Now, the comments section immediately highlighted that it's also more dangerous. You have actually specified that the assumption is the number of nodes in the list is even. If that is a guaranteed and enforceable precondition, then it's technically okay to do this.
Even a smart optimizing compiler has no way of knowing about this precondition of even list-length, so the very best it could likely achieve is to recognize that count is only used for controlling whether total is updated and so the loop could be unrolled as follows:
// Possible automatic compiler optimization of First Approach
while (ptr)
{
total += ptr->data;
ptr = ptr->next;
// Skip over every second node
if (ptr) ptr = ptr->next;
}
In basic terms, what we now have is one more pointer test (branch) per loop iteration than your Second Approach has. This results in more instructions (specifically a branching instruction) and so the code will technically be (slightly) slower.
Of course, the actual impact of this is likely to be very small. Your main bottleneck is pointer indirection and fetches from memory, rather than the pointer test itself. If the memory used by each node is not mostly contiguous, you'll run into caching problems on large lists (which in practice affects performance by about a factor of 100).
What I mean to indicate by all the above, is that the benefits of your special optimization based on the precondition of even list-length suffers from diminishing returns.
Given that it is inherently unsafe unless very well-documented in the code and/or protected by a list "evenness" test (if you store the node count somewhere), I would recommend coding defensively by using your First Approach or use my equivalent and (arguably) tidier version of that.

Related

Difference between iterating to next node in recursive function vs recursive function call

In this piece of code, I am comparing two linked lists and checking whether the items in the linked lists are equal or not.
bool Check(Node *listA, Node *listB) {
if (listA == NULL && listB == NULL)
return true;
if (listA->item == listB->item)
{
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
}
return false;
}
I was wondering what the difference between this code:
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
and this is:
return Check(listA->link, listB->link);
Both pieces of code produce the correct answer, but I can't seem to understand what the difference is.
There is no difference, they do exactly the same thing. The only difference is that you could change something in the next node if you needed before calling Check(). But in your case they are exactly the same, the second option is cleaner tho so i recommend that one.
In general, modifying an IN parameter value makes your function's code and intent less clear, so it's best to avoid it.
Also consider that if you are using a debugger and step back to a prior recursive call, you will not be able to see the correct node that was inspected, since its pointer was already overwritten. Thus it will be more confusing to debug.
Practically speaking, the outcome of both functions will be the same. The second one may be infinitesimally faster due to skipping the two pointless assignment operations, unless that is optimized away.

This is strange question of Space Complexity. Can someone provide any insights?

I was solving this question when this approach clicked in -
Given a single linked list and an integer x. Your task is to complete the function deleteAllOccurances() which deletes all occurrences of a key x present in the linked list. The function takes two arguments: the head of the linked list and an integer x. The function should return the head of the modified linked list.
I am not sure what is the space complexity of my code.
I think since I am only using 1 extra Node space and simultaneously creating new nodes and deleting old ones, so it should be O(1).
Node* deleteAllOccurances(Node *head,int x)
{
Node *new_head = new Node(-1);
Node *tail = new_head;
Node *temp = head;
Node *q;
while(temp != NULL) {
if(temp->data != x) {
tail->next = new Node(temp->data);
tail = tail->next;
}
q = temp;
delete q;
temp = temp->next;
}
tail->next = NULL;
return new_head->next;
}
Well, kind of.
It depends on whether you are considering total allocations as a net change (in which case you're right).
But if you are thinking about the amount of times you hit the heap for new allocations, then it's using more space and a ton of computation. (A given C++ compiler and runtime is not obliged to guarantee immediately reusing space freed in the heap, just that it's available for reuse.)
As a C++ programmer for decades, what you're doing is mildly horrifying because you're doing a lot of new allocation. That results in thrashing the heap allocation structures.
Also, the way you're doing this is pushing stuff which doesn't match to the end of the list so you are shuffling the contents down.
Hint - you should not need to create any new Nodes.
Yes, since how much space you have allocated at any single time doesn't depend on the arguments (e.g. the length of the list or how many values of x are in the list) the space complexity of the function is O(1)
The practical point of space complexity is to see how much memory your algorithm will require. You never require more than 1 node of memory (plus the local variables) and O(1) reflects that.
Measuring complexity in part depends on what you consider to be your variables. In terms of the number of nodes in the list, your algorithm is O(1) in space usage. However, this might not be the best perspective in this case.
Another variable in this situation is the size of a node. Often this aspect is ignored by complexity analysis, but I think it has value in this case. While your algorithm's space requirement does not depend on the number of nodes, it does depend on the size of a node. The more data in the node, the more space you need. Let s be the size of a single node; it would be fair to say that your algorithm's size requirement is O(s).
The size requirement of the more common algorithm for this task is O(1) even when accounting for both the number of nodes and the size of each node. (It has no need to create any nodes, no need to copy data.) I would not recommend your algorithm over that one.
To avoid being all negative, I would view your approach as two independent changes to the traditional one. One change is the introduction of the dummy node new_head. This change is useful (and in fact is in use), even though your implementation leaks memory. It is only marginally less efficient than not using a dummy head, and it simplifies the logic for removing nodes from the front of the list. This is good as long as your node size is not overly large.
The other change is the switch to copying nodes instead of moving them. This is the cringe-worthy change as it gratuitously adds work to the programmer, the compiler, and the execution. Asymptotic analysis (big-O) might not pick up on this addition, but it is there with no beneficial gains. You've trashed a key benefit of linked lists and gotten nothing in return.
Let's look at dropping the second change. You would need to add one line, specifically initializing new_head->next to head, but this is balanced out by removing the need to set tail->next to nullptr at the end. Another addition is an else clause so that the lines currently run every iteration are not necessarily run every iteration. Beyond that are code removal and some name changes: drop the temp pointer (use tail->next instead) and drop the creation of new nodes in the loop. Taken together, these changes strictly reduce the work being done (and the memory needs) compared to your code.
To address the memory leak, I've used a local dummy node instead of dynamically allocating it. That removes the last use of new, which in turn removes most of the objections raised in the question's comments.
Node* deleteAllOccurances(Node *head, int x)
{
Node new_head{-1}; //<-- Avoid dynamic allocation
new_head.next = head; //<-- added line
Node *tail = &new_head;
while(tail->next != nullptr) {
if(tail->next->data != x) {
tail = tail->next;
}
else { //<-- make the rest of the loop conditional
Node *q = tail->next;
tail->next = tail->next->next;
delete q;
}
}
return new_head.next;
}
This version removes the "cringe factor" as there is a benefit to the one node being created, and new is not being used. This version is clean enough to subject to complexity analysis without everyone asking "why???".

Write the most efficient implementation of the function RemoveDuplication()

so i am studying and i have this question , Write the most efficient implementation of the function RemoveDuplication(), which removes any duplication in the list. Assume that the list is sorted but may have duplication. So, if the list is originally 
<2, 2, 5, 6, 6, 6, 9>, your function should make it <2, 5, 6, 9>.
the code that i thought of to remove the duplication is this right here , i wanted to know , if there is more efficient ways of removing the duplications in the list
template <class T>
void DLList<T>:: RemoveDuplication()
{
for(DLLNode<T>*ptr = head; ptr!=NULL; ptr=ptr->next)
while (ptr->val == ptr->next->val)
{
ptr->next->next->prev = ptr;
ptr->next = ptr->next->next;
}
}
It looks like your code will run in O(n) which is good for an algorithm. It is probably not going to be any more efficient, because you'll have to visit every item to delete it.
If you don't want to delete duplicate objects though, but want to return an new list containing the non-duplicate objects, you could make it slightly faster by making it O(m) where m is the amount of unique numbers, which is smaller or equal to n. But i couldn't think up any way to do this.
Recapping, it is possible to be slightly faster, but this is hard and the improvement is negligible.
ps. Dont forget to delete stuff when you take it out of your list ;)
I think O(n) is ok.
However the most important thing is your program will crash :-)
for(DLLNode<T>*ptr = head; ptr!=NULL; ptr=ptr->next)
while (ptr->val == ptr->next->val)
The code is deferencing ptr->next without checking that it is != NULL.
Hence, the algorithm should crash when it reaches the last element of the list.
And now an optimization question for you: how to make the program correct without testing for ptr AND ptr->next at each iteration ?

separate chaining in hashing

I am reading about hashing in Robert Sedwick book on Algorithms in C++
We might be using a header node to streamline the code for insertion
into an ordered list, but we might not want to use M header nodes for
individual lists in separate chaining. Indeed, we could even eliminate
the M links to the lists by having the first nodes in the lists
comprise the table
.
class ST
{
struct node
{
Item item;
node* next;
node(Item x, node* t)
{ item = x; next = t; }
};
typedef node *link;
private:
link* heads;
int N, M;
Item searchR(link t, Key v)
{
if (t == 0) return nullItem;
if (t->item.key() == v) return t->item;
return searchR(t->next, v);
}
public:
ST(int maxN)
{
N = 0; M = maxN/5;
heads = new link[M];
for (int i = 0; i < M; i++) heads[i] = 0;
}
Item search(Key v)
{ return searchR(heads[hash(v, M)], v); }
void insert(Item item)
{ int i = hash(item.key(), M);
heads[i] = new node(item, heads[i]); N++; }
};
My two questions on above text what does author mean by
"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table." How can we modify above code for this?
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.
"We could even eliminate the M links to the lists by having the first nodes in the lists comprise the table."
Consider Node* x[n] vs Node x[n]: the former needs an extra pointer and on-insertion memory allocated for the head Node of every non-empty element, and an extra indirection for every hash table operation, while the latter eliminates the n pointers but requires that any unused elements will be able to be put in some discernable not-in-use state (tracking of which may or may not require extra memory), and if sizeof(Node) size is greater than sizeof(Node*), it may be more wasteful of memory anyway. The difference in memory use can also affect efficiency of cache use: if the table has a high element to buckets ratio then a Node[] gets the Node data into fewer contiguous memory pages, and if you're iterating (in unsorted order) then it's very cache efficient, whereas Node*[] will jump to separate memory allocations that might be all over the place (or on the other hand, might actually be quite close together in some actually useful: e.g. if both access patterns and dynamic memory allocation addresses correlate to chronological time of object creation.
How can we modify above code for this?
First, your existing code has a problem: heads[i] = new node(item, heads[i]); overwrites an entry in the hash table without first checking if it's empty... if there's anything there then you should be adding to the list, not overwriting the array.
The design change discussed needs:
link* heads;
...changed to...
node* head;
You'd initialise it like this:
head = new node[M];
Which needs an extra node constructor (if item has an equivalent default constructor, you can leave out its initialisation below)
node() : item(nullItem), next(nullptr) { }
Then there's some knock on changes to the rest of your code that are easy to work through. Basically, you're getting rid of a layer of pointers.
"we might not want to use M header nodes for individual lists in separate chaining." What does this statement mean.
I didn't write it so can't say authoritatively, but it appears to be saying that when designing the list code, a decision might have been made to have an initial Node even in an empty list, as this simplifies code for several list operations. While the extra data-less Node might seem a reasonable price when contemplating "usual" uses of a list, hash tables are unusual in that you want most of the lists chained of the buckets to have 0 or 1 element, and exponentially fewer should be longer and longer. So, such a list implementation is poorly suited to use in a hash table.

Efficiently filling in first gap in an ordered list of numbers

I'm writing something that needs to start with a list of numbers, already in order but possibly with gaps, and find the first gap, fill in a number in that gap, and return the number it filled in. The numbers are integers on the range [0, inf). I have this, and it works perfectly:
list<int> TestList = {0, 1, 5, 6, 7};
int NewElement;
if(TestList.size() == 0)
{
NewElement = 0;
TestList.push_back(NewElement);
}
else
{
bool Selected = false;
int Previous = 0;
for(auto Current = TestList.begin(); Current != TestList.end(); Current++)
{
if(*Current > Previous + 1)
{
NewElement = Previous + 1;
TestList.insert(Current, NewElement);
Selected = true;
break;
}
Previous = *Current;
}
if(!Selected)
{
NewElement = Previous + 1;
TestList.insert(TestList.end(), NewElement);
}
}
But I'm worried about efficiency, since I'm using an equivalent piece of code to allocate uniform block binding locations in OpenGL behind a class wrapper I wrote (but that isn't exactly relevant to the question :) ). Any suggestions to improve the efficiency? I'm not even sure if std::list is the best choice for it.
Some suggestions:
Try other containers and compare. A linked list may have good theoretical properties, but the real-life benefits of contiguous storage, such as in a sorted vector, can be dramatic.
Since your range is already ordered, you can perform binary search to find the gap: Start in the middle, and if the value there is equal to half the size, there's no gap, so you can restrict searching to the other half. Rinse and repeat. (This assumes that there are no repeated numbers in the range, which I suppose is a reasonable restriction given that you have a notion of "gap".)
This is more of a theoretical, separate suggestion. A binary search on a pure linked list cannot be implemented very efficiently, so a different sort of data structure would be needed to take advantage of this approach.
Some suggestions:
A cursor gap structure (a.k.a. “gap buffer”) based on std::vector is probably the fastest (regardless of system) for this. For the implementation, set the capacity at the outset (known from the largest number) so as to avoid costly dynamic allocations.
Binary search can be employed on a cursor gap structure, with then fast block move to move the insertion point, but do measure if you go this route: often methods that are bad for a very large number of items, such as linear search, turn out to be best for a small number of items.
Retain the position between calls, so you don't have to start from the beginning each time, reducing the total algorithmic complexity for filling in all, from O(n2) to O(n).
Update: based on the OP’s additional commentary information that this is a case of allocating and deallocating numbers, where the order (apparently) doesn't matter, the fastest is probably to use a free list, as suggested in a comment by j_random_hacker. Simply put, at the outset push all available numbers onto a stack, e.g. a std::stack, the free-list. To allocate a number simply pop it off the stack (choosing the number at top of the stack), to deallocate a number simply push it.