How to properly use sentinel nodes? - c++

There will be multiple (closely) related questions instead of a single one. For our comfort, I will number them accordingly.
Based on this Wikipedia article, this question and lectures, I think I already understand the idea behind sentinel nodes and their usage in linked lists. However, a few things are still not clear to me even after reading these materials.
I was given a basic implementation of a doubly linked list (it stores only int values) and the task is to change the implementation so it uses a sentinel node like this:
Illustrative image (not allowed to embed images yet, sorry)
Question 1
I am assuming that the head variable of the list will point to the first real node (the one after sentinel node) and the tail variable will simply point to the last node. Am I correct or should the head point to the sentinel node? I am asking for a best-practice or the most standard approach here.
Question 2
I understand that when searching for a value in the list, I no longer have to check for nullptr since I am using a sentinel node. Since the list basically formed a circle thanks to the sentinel node, I have to terminate it after iterating through the whole list and reaching it. Can I do it by putting the value I am looking for in the sentinel node and use it as a sentinel value of sorts and then check if the result is returned from the sentinel node when the loop ends? Some sources claim that sentinel nodes should not store any values at all. Is my approach correct/reasonably effective?
Question 3
When simply iterating and not searching for a particular value (e.g. counting nodes, outputting the whole list into the console), do I have to check for the sentinel node the same way as I would for a nullptr (to terminate the iterating loop) or is there a different or smarter way of doing this?

Answer 1
Yes this is a valid position for the sentinelnode to take. head and tail can point to the actual beginning and end of the data. But your add and delete functions will need to be aware of the aberrations caused at the list boundaries by virtue of the sentinel node
Answer 2
Yes this is a valid search strategy and is infact called the Elephant in Cairo technique
Answer 3
Yes, the purpose of the sentinel node is to let you know that it is the sentinel node. You could just maintain a constant pointer (or whatever your lang of choice supports) to this sentinel node to check if you are at the sentinel node or just stick a flag in the node.

Related

"Guessing" what side of a doubly linked list to start on

A bit hard to explain what I'm planning, but here it goes. I have a doubly linked list of objects which are ordered alphabetically by a member attribute called name. I wish to remove a Node with a specific name, but I would like to remove it in such a way that it is more likely to start looking for it on the side of the list closer to it.
So I was thinking that I would have to find the 'midpoint' between the first Node's name and the last Node's name. Then I will check to see if that midpoint is less than the name of the Node. If it is less, I will start from the tail, otherwise I will start from the head.
The problem I am having is that I am unable to convert a string directly into an int. My potential solution is this:
Convert each individual character in the head and tail to an int
Put each conversion into an int array, one array for the head, one for the tail
Convert each int into a string again and put them into a new array
Make each converted string have a length of 3 by inserting 0s into them if they have less than a length of 3
Add the strings in each array together
Convert the strings to int again and find the difference between the two ints and divide that by 2
Add the new value to the first Node's converted name
Find if this 'midpoint' is less than the name of the Node I want to remove
If it is, start searching from the tail
Else, search from the head
Is there any easier way to go about doing this?
Alf's comment is realistically what you want. In order to decide which end to be on, you are getting maximum resolution by simply finding the first different character and then picking based on the midpoint.
Algorithm idea
list = ["apple", "banana", "orange"]
word_to_search_for = ["banana"]
index = 0
while list[0][index] == list[last][index]:
if word_to_search_for[index] != list[0][index]:
return "word not in list"
++index
spread = list[last][index] - list[0][index]
if (word_to_search_for[index] - list[0][index])> spread/2:
start at last
else:
start at 0
As others have already alluded, your main problem is that you're using the wrong data structure. Your question shouldn't be "How do I make a double linked list operate in a manner that is distinctly unlike a double linked list?", it should be "What is the best data structure for {insert your specific use case}?".
Reading between the lines, it appears that you're after something that allows for insertions, removals and relatively high speed scans. This leads me to suggest a Left Leaning Red Black Tree: see https://en.wikipedia.org/wiki/Left-leaning_red%E2%80%93black_tree
You could create an array of pointers to some sub-set of nodes in the list, like pointers to the first, middle, and last node of a list. You could use more pointers to reduce the search time, perhaps 4 to 16 pointers. Sort of a hierarchical overall structure. The array would need to be updated as nodes are deleted (at least the pointers to deleted nodes, pick the node before or after if this happens, or shrink the array). At some point, a tree like structure would be better.

Finding corruption in a linked list

I had an interview today for a developer position and was asked an interesting techincal question that i did not know the answer to. I will ask it here to see if anyone can provide me with a solution for my curiosity. It is a multi-part question:
1) You are given a singly linked list with 100 elements (integer and a pointer to next node), find a way to detect if there is a break or corruption halfway through the linked list? You may do anything with the linked list. Note that you must do this in the list as it is iterating and this is verification before you realise that the list has any issues with it.
Assuming that the break in the linked list is at the 50th element, the integer or even the pointer to the next node (51st element) may be pointing to a garbage value which is not necessarily an invalid address.
2) Note that if there is a corruption in the linked list, how would you minimize data loss?
To test for a "corrupted" integer, you would need to know what the range of valid values is. Otherwise, there is no way to determine that the value in any given (signed) integer is invalid. So, assuming you have a validity test for the int, you would always check that value before iterating to the next element.
Testing for a corrupted pointer is trickier - for a start, what you need to do is check the value of the pointer to the next element before you attempt to de-reference it, and ensure it is a valid heap address. That will avoid a segmentation fault. The next thing is to validate that what the pointer points at is in fact a valid linked list node element - that's a bit trickier? Perhaps de-reference the pointer into a list element class/struct, and test the validity of the int and "next" pointer, if they are also good, then can be pretty sure the previous node was good also.
On 2), having discovered a corrupted node, [if the next pointer is corrupted] what you should do is set the "next pointer" of the previous node to 'NULL' immediately, marking it as the end of the list, and log your error etc etc. if the corruption was just to the integer value, but not to the "next" element pointer, then you should remove that element from the list and link the previous and following nodes together instead - as no need to throw the rest of the list away in that case!
For the first part - Overload the new operator. When ever a new node is allocated allocate some additional space before and after the node and put some known values there. In traversal each node can be checked if it is in between the known values.
If you at design time know that corruption may become a critical issue, you could add a "magic value" as a field into the node data structure which allows you to identify whether some data is likely to be a node or not. Or even to run through memory searching for nodes.
Or double some link information, i.e. store the address of the node after the next node in each node, such that you can recover if one link is broken.
The only problem I see is that you have to avoid segmentation faults.
If you can do anything to the linked list, what you can do is to calculate the checksum of each element and store it on the element itself. This way you will be able to detect corruption even if it's a single bit error on the element.
To minimize data loss perhaps you can consider having storing the nextPtr in the previous element, that way if your current element is corrupted, you can always find the location of the next element from the previous.
This is an easy question, and there are several possible answers. Each trades off robustness with efficiency. Since increased robustness is a prerequisite of the question being asked, there are solutions available which sacrifice both time (list traversal speed, as well as speed of insertion and speed of deletion of nodes) or alternately space (extra info stored with each node). Now the problem has been stated that this is a fixed list of length 100, in which case the data structure of a linked list is most inappropriate. Why not make the puzzle a little more challenging and say that the size of the list is not known a priori?
Since the number of elements (100) is known, 100th node must contain a null pointer. If it does, the list with some good probability is valid (this cannot be guaranteed, if, for example, 99th node is corrupt and points to some memory location with all zeros). Otherwise, there is some problem (this can be returned as a fact).
upd: Also, it could be possible to, an every step, look at some structures delete would use if given the pointer, but since using delete itself is not safe in any sense, this is going to be implementation-specific.

Protege: how to express 'not hasNext'?

I am currently developing an ontology using protege and would like to determine if a node is a last one of a list. So basically a list points to a node and every node has some content and can have another node:
List startsWith some Node
Node hasContent some Content
Node hasNext some Node
Now I'd like to define a subclass named EndNode that doesn't point to another Node. This is what I've tried so far, but the after classifying, EndNode always equals Nothing:
Node and not(hasNext some Node)
Node and (hasNext exactly 0 Node)
First, there is a built-in List construct in RDF which you can use in the following way:
ex:mylist rdf:type rdf:List .
ex:myList rdf:first ex:firstElement .
ex:myList rdf:rest _:sublist1 .
_:sublist1 rdf:first ex:SecondElement .
_:sublist1 rdf:rest rdf:nil .
Here, in order to know you reach the end of the list, you need a special list called rdf:nil. This plays the same role as a null pointer at the end of a linked list in programming languages.
However, even though rdf:List is well used in existing data on the Web, it doesn't constrain in any way the use of the predicates rdf:first and rdf:rest, so you can have many first elements for a given list without triggering an inconsistency.
So, if you really want to model linked list in a strict way, you need pretty expressive features of OWL. I did it a while ago and it can be found at http://purl.org/az/List.
It's normal that you have an empty class as you specified that a Node must have a nextNode. You should not impose that Nodes have content or next element. You should rather say that the cardinality is maximum 1, that the domain and range of hasNext is Node, and that EndNode is a node with no next node. But it's still not enough, as it does not impose that there is an EndNode at all. You may have an infinite sequence or a loop.
If you want to avoid loops or infinite sequence, you have to define the transitive property hasFollower and say that there is at least a follower in the class EndNode.
All in all, implementing strict lists in OWL completely sucks in term of performance and is most of the time totally useless as rdf:List is sufficient for the wide majority of the situations.

copy linked list with random link in each node, each node has a variable,which randomly points to another node in the list

An interview question:
copy linked list with random link in each node, each node has a variable,which randomly
points to another node in the list.
My ideas:
Iterate the list, copy each node and its pointed nodes by its variable and add a sentinel at the end and then do the same thing for each node.
In the new list, for each node i, separate each list ended with sentinel and use i's variable points to it.
It is not efficient in space. It is O(n^2) in time and space.
Better ideas?
I think you can pinch ideas from e.g. Java
Serialisation, which recognises when pointers point to nodes already serialised, so that it can serialise (and then deserialise) arbitrary structures reasonably efficiently. The spec, which you can download via a link at http://docs.oracle.com/javase/1.4.2/docs/guide/serialization/index.html, says that this is done but doesn't say exactly how - I suspect hash tables.
I think copying is a lot like this - you don't even need to know that some of the pointers make up a linked list. You could use a depth first search to traverse the graph formed by the nodes and their pointers, putting the location of each node in a hash table as you go, with the value the copied node. If the node is already present you don't need to do anything except make the pointer in the copied node point to the copy of the node pointed to as given by the hash table. If the node is not already present, create the copy, put the node in the hash table with the address of the copy as its value, and recursively copy the information in the node, and its pointers, into the newly made copy.
This is a typical interview question. You can find many answers by using Google. This is a link I think is good for understanding. But please read the comments too, there are some errors in the main body: Copy a linked list with next and arbit pointer

Determine the first node in a Linked List

If the last node of a linked list is connected to the first node, it makes a ring.
Then how would you identify which of the nodes in the linked list is the first node and the last one?
You wouldn't. If it is a ring, then first and last are meaningless. Any node can be first or last.
If you define "first" as "created first", then you would probably want to add some sequencing information to the nodes to be able to know that.
Presumably there would be a pointer to the first node of a linked list (you need a way of entering the list). Additionally it is convenient in this case to maintain a pointer to the last node in the list.
If you are more specific about what you need to know I can be more helpful.
What you are describing is a circularly linked list. It is possible to know both the first and last item in the list based off just maintaining the last node. Logically that requires it's successor to be the first node.
wikipedia has a bit more about it: https://en.wikipedia.org/wiki/Linked_list#Circularly_linked_list
If you are trying to implement a circular list, have a look here http://en.wikipedia.org/wiki/Linked_list#Circularly_linked_list
When you create the Linkedlist, store the address of the first node in a variable(preferably a private variable) so that at any point of time , you can compare this address with the current node's address