Finding corruption in a linked list - c++

I had an interview today for a developer position and was asked an interesting techincal question that i did not know the answer to. I will ask it here to see if anyone can provide me with a solution for my curiosity. It is a multi-part question:
1) You are given a singly linked list with 100 elements (integer and a pointer to next node), find a way to detect if there is a break or corruption halfway through the linked list? You may do anything with the linked list. Note that you must do this in the list as it is iterating and this is verification before you realise that the list has any issues with it.
Assuming that the break in the linked list is at the 50th element, the integer or even the pointer to the next node (51st element) may be pointing to a garbage value which is not necessarily an invalid address.
2) Note that if there is a corruption in the linked list, how would you minimize data loss?

To test for a "corrupted" integer, you would need to know what the range of valid values is. Otherwise, there is no way to determine that the value in any given (signed) integer is invalid. So, assuming you have a validity test for the int, you would always check that value before iterating to the next element.
Testing for a corrupted pointer is trickier - for a start, what you need to do is check the value of the pointer to the next element before you attempt to de-reference it, and ensure it is a valid heap address. That will avoid a segmentation fault. The next thing is to validate that what the pointer points at is in fact a valid linked list node element - that's a bit trickier? Perhaps de-reference the pointer into a list element class/struct, and test the validity of the int and "next" pointer, if they are also good, then can be pretty sure the previous node was good also.
On 2), having discovered a corrupted node, [if the next pointer is corrupted] what you should do is set the "next pointer" of the previous node to 'NULL' immediately, marking it as the end of the list, and log your error etc etc. if the corruption was just to the integer value, but not to the "next" element pointer, then you should remove that element from the list and link the previous and following nodes together instead - as no need to throw the rest of the list away in that case!

For the first part - Overload the new operator. When ever a new node is allocated allocate some additional space before and after the node and put some known values there. In traversal each node can be checked if it is in between the known values.

If you at design time know that corruption may become a critical issue, you could add a "magic value" as a field into the node data structure which allows you to identify whether some data is likely to be a node or not. Or even to run through memory searching for nodes.
Or double some link information, i.e. store the address of the node after the next node in each node, such that you can recover if one link is broken.
The only problem I see is that you have to avoid segmentation faults.

If you can do anything to the linked list, what you can do is to calculate the checksum of each element and store it on the element itself. This way you will be able to detect corruption even if it's a single bit error on the element.
To minimize data loss perhaps you can consider having storing the nextPtr in the previous element, that way if your current element is corrupted, you can always find the location of the next element from the previous.

This is an easy question, and there are several possible answers. Each trades off robustness with efficiency. Since increased robustness is a prerequisite of the question being asked, there are solutions available which sacrifice both time (list traversal speed, as well as speed of insertion and speed of deletion of nodes) or alternately space (extra info stored with each node). Now the problem has been stated that this is a fixed list of length 100, in which case the data structure of a linked list is most inappropriate. Why not make the puzzle a little more challenging and say that the size of the list is not known a priori?

Since the number of elements (100) is known, 100th node must contain a null pointer. If it does, the list with some good probability is valid (this cannot be guaranteed, if, for example, 99th node is corrupt and points to some memory location with all zeros). Otherwise, there is some problem (this can be returned as a fact).
upd: Also, it could be possible to, an every step, look at some structures delete would use if given the pointer, but since using delete itself is not safe in any sense, this is going to be implementation-specific.

Related

How to properly use sentinel nodes?

There will be multiple (closely) related questions instead of a single one. For our comfort, I will number them accordingly.
Based on this Wikipedia article, this question and lectures, I think I already understand the idea behind sentinel nodes and their usage in linked lists. However, a few things are still not clear to me even after reading these materials.
I was given a basic implementation of a doubly linked list (it stores only int values) and the task is to change the implementation so it uses a sentinel node like this:
Illustrative image (not allowed to embed images yet, sorry)
Question 1
I am assuming that the head variable of the list will point to the first real node (the one after sentinel node) and the tail variable will simply point to the last node. Am I correct or should the head point to the sentinel node? I am asking for a best-practice or the most standard approach here.
Question 2
I understand that when searching for a value in the list, I no longer have to check for nullptr since I am using a sentinel node. Since the list basically formed a circle thanks to the sentinel node, I have to terminate it after iterating through the whole list and reaching it. Can I do it by putting the value I am looking for in the sentinel node and use it as a sentinel value of sorts and then check if the result is returned from the sentinel node when the loop ends? Some sources claim that sentinel nodes should not store any values at all. Is my approach correct/reasonably effective?
Question 3
When simply iterating and not searching for a particular value (e.g. counting nodes, outputting the whole list into the console), do I have to check for the sentinel node the same way as I would for a nullptr (to terminate the iterating loop) or is there a different or smarter way of doing this?
Answer 1
Yes this is a valid position for the sentinelnode to take. head and tail can point to the actual beginning and end of the data. But your add and delete functions will need to be aware of the aberrations caused at the list boundaries by virtue of the sentinel node
Answer 2
Yes this is a valid search strategy and is infact called the Elephant in Cairo technique
Answer 3
Yes, the purpose of the sentinel node is to let you know that it is the sentinel node. You could just maintain a constant pointer (or whatever your lang of choice supports) to this sentinel node to check if you are at the sentinel node or just stick a flag in the node.

Copy Construction For hashMap in C++

In the recent assignment, we are asked to implement a hashmap in C++ without the techniques provided in STL.
I'm stack on one of the functions -- copy constructor. After searching the google, I found a valid solution in the question:
Writing a valid copy constructor for a hash map in C++
But I can't totally understand it. Could anyone please help explain
1. why we need to use a pointer-to-pointer Node** p = &hashTable[i]; ?
2. what is the logic in the while loop?
3. especially, what does this code p=&c->next; mean?
Firstly, there are many different types of hash table implementations, so any specific one you find online may or may not yield insights into what you'll need to do for your own implementation. That said...
p is initially pointed at the head element for the bucket, which is the Node*s at [this->]hashTable[i]. It's initially used to set it to NULL. As you're dealing with Node*s, a Node** is a natural way to keep track of their locations.
each iteration of the while loop duplicates the next Node that's in bucket [i] in hm; the duplicate is created in new memory at c, and *p (which tracks the linked list positions being created for the *this object under construction) is updated to point thereto.
p=&c->next; means p is set to the next member of the newly created Node (at address c): that next pointer must be initialised by the Node(const Node&) constructor to nullptr/NULL/0, or the linked lists created wouldn't terminate properly. Only if there are more elements in the linked list of colliding elements to be copied, the next iteration will overwrite *p and therefore the next member of the previously added Node with the next value of c.
Summarily, you're looking at a loop that copies amountOfBuckets linked lists. If you're not familiar with linked list operations, you'd be better off writing a linked list class first and getting that working, then use it to help implement the hash table.

Implementation detail of lock-free single-writer multi-reader list

I am attempting to understand the implementation of the single-writer multi-reader doubly linked list found in
http://web.cecs.pdx.edu/~walpole/class/cs510/papers/11.pdf
on page 10 of the pdf (or 500 of the journal article).
I simply cannot understand how the Insert and Delete functionality is working
My understanding of it is
A double pointer is passed in. The inner pointer is presumably the address of what I would normally call the left link.
For some reason the Next pointer (what I would normally call right link) is set to the address contained in the double pointer.
The call to (next != null) has me very confused, as if next was null then the double pointer to Previous doesn't provide a link back
The node pointer is stored into the double pointer. This must be the mechanism where the Previous nodes Next pointer is set as there isn't any other method.
I think my basic question comes down to what does the inner pointer in the double pointer point to?
Things might would make sense to me if line 1 dereference Previous and used Previous.Next to assign to Next AND if line 4 had set next.Prev to a pointer with address of the to be inserted node, but even then it still seems incorrect.
Tagging the question C++ since the pseudo-code syntax is closest to C++ with some Pascal. If this question is better suited to the cs.stackexchange please relocate it.
Instead of pointing to the previous node, the Prev member points to the Next member of the previous node. It's a bit strange, but perhaps it saves some arithmetic, since we only look at the Key member during forward traversals, and it saves having to allocate a whole node for the list head.

How do you implement a linked list using an array

Now, I know you must be telling to yourself, "Why the heck would anyone even do that?" But, it's something that will give us a really insightful knowledge about some primitive stuff. Kindly unleash your talent.
It is a valid question - college level data structure question. And so the answer can be found in many data structures books. http://books.google.co.in/books/about/Data_Structures_Using_C.html?id=X0Cd1Pr2W0gC
The wording of your question makes it seem that you are aware of the difference between linked lists and arrays. So I'm going to skip that part.
The main point to remember in the implementation is that while linked lists have pointers to the next element, in an array this will automatically be the next index. So, one way of implementing is to store all the data points of the linked list in the array. If you have to insert or delete an element, then you would first have to create a space in the array to place them at, or remove the extra space created. In a linked list you could have simply changed the pointers for one/two nodes and you would be done. However, we can't do that in an array since we can't manipulate the next pointers in the array. So, a simple idea is to shift every element to the left or right by one step depending upon your choice of operation. In case of insertion, insert that element in the space created by shifting right. In case of deletion, shift everything to the right of the element to be deleted to the left by one index. Note that this way every insertion and deletion will be O(n).
An idea avoid these repeated shifts in case of deletion could be to replace the element to be deleted by a pre-decided character, say ''. So, while traversing the array, a '' can be interpreted as an empty space. This will avoid left shifts in case of deletion. Also, when the array is full, we can traverse the entire array and remove all the '*' and shift the elements in one pass.
Take care to introduce checks about array bounds.

Implementing Skip List in C++

[SOLVED]
So I decided to try and create a sorted doubly linked skip list...
I'm pretty sure I have a good grasp of how it works. When you insert x the program searches the base list for the appropriate place to put x (since it is sorted), (conceptually) flips a coin, and if the "coin" lands on a then that element is added to the list above it(or a new list is created with element in it), linked to the element below it, and the coin is flipped again, etc. If the "coin" lands on b at anytime then the insertion is over. You must also have a -infinite stored in every list as the starting point so that it isn't possible to insert a value that is less than the starting point (meaning that it could never be found.)
To search for x, you start at the "top-left" (highest list lowest value) and "move right" to the next element. If the value is less than x than you continue to the next element, etc. until you have "gone too far" and the value is greater than x. In this case you go back to the last element and move down a level, continuing this chain until you either find x or x is never found.
To delete x you simply search x and delete it every time it comes up in the lists.
For now, I'm simply going to make a skip list that stores numbers. I don't think there is anything in the STL that can assist me, so I will need to create a class List that holds an integer value and has member functions, search, delete, and insert.
The problem I'm having is dealing with links. I'm pretty sure I could create a class to handle the "horizontal" links with a pointer to the previous element and the element in front, but I'm not sure how to deal with the "vertical" links (point to corresponding element in other list?)
If any of my logic is flawed please tell me, but my main questions are:
How to deal with vertical links and whether my link idea is correct
Now that I read my class List idea I'm thinking that a List should hold a vector of integers rather than a single integer. In fact I'm pretty positive, but would just like some validation.
I'm assuming the coin flip would simply call int function where rand()%2 returns a value of 0 or 1 and if it's 0 then a the value "levels up" and if it's 0 then the insert is over. Is this incorrect?
How to store a value similar to -infinite?
Edit: I've started writing some code and am considering how to handle the List constructor....I'm guessing that on its construction, the "-infinite" value should be stored in the vectorname[0] element and I can just call insert on it after its creation to put the x in the appropriate place.
http://msdn.microsoft.com/en-us/library/ms379573(VS.80).aspx#datastructures20_4_topic4
http://igoro.com/archive/skip-lists-are-fascinating/
The above skip lists are implemented in C#, but can work out a c++ implementation using that code.
Just store 2 pointers. One called above, and one called below in your node class.
Not sure what you mean.
According to wikipedia you can also do a geometric distribution. I'm not sure if the type of distribution matters for totally random access, but it obviously matters if you know your access pattern.
I am unsure of what you mean by this. You can represent something like that with floating point numbers.
You're making "vertical" and "horizontal" too complicated. They are all just pointers. The little boxes you draw on paper with lines on them are just to help visualize something when thinking about them. You could call a pointer "elephant" and it would go to the next node if you wanted it to.
eg. a "next" and "prev" pointer are the exact same as a "above"/"below" pointer.
Anyway, good luck with your homework. I got the same homework once in my data structures class.