Implementing Skip List in C++ - c++

[SOLVED]
So I decided to try and create a sorted doubly linked skip list...
I'm pretty sure I have a good grasp of how it works. When you insert x the program searches the base list for the appropriate place to put x (since it is sorted), (conceptually) flips a coin, and if the "coin" lands on a then that element is added to the list above it(or a new list is created with element in it), linked to the element below it, and the coin is flipped again, etc. If the "coin" lands on b at anytime then the insertion is over. You must also have a -infinite stored in every list as the starting point so that it isn't possible to insert a value that is less than the starting point (meaning that it could never be found.)
To search for x, you start at the "top-left" (highest list lowest value) and "move right" to the next element. If the value is less than x than you continue to the next element, etc. until you have "gone too far" and the value is greater than x. In this case you go back to the last element and move down a level, continuing this chain until you either find x or x is never found.
To delete x you simply search x and delete it every time it comes up in the lists.
For now, I'm simply going to make a skip list that stores numbers. I don't think there is anything in the STL that can assist me, so I will need to create a class List that holds an integer value and has member functions, search, delete, and insert.
The problem I'm having is dealing with links. I'm pretty sure I could create a class to handle the "horizontal" links with a pointer to the previous element and the element in front, but I'm not sure how to deal with the "vertical" links (point to corresponding element in other list?)
If any of my logic is flawed please tell me, but my main questions are:
How to deal with vertical links and whether my link idea is correct
Now that I read my class List idea I'm thinking that a List should hold a vector of integers rather than a single integer. In fact I'm pretty positive, but would just like some validation.
I'm assuming the coin flip would simply call int function where rand()%2 returns a value of 0 or 1 and if it's 0 then a the value "levels up" and if it's 0 then the insert is over. Is this incorrect?
How to store a value similar to -infinite?
Edit: I've started writing some code and am considering how to handle the List constructor....I'm guessing that on its construction, the "-infinite" value should be stored in the vectorname[0] element and I can just call insert on it after its creation to put the x in the appropriate place.

http://msdn.microsoft.com/en-us/library/ms379573(VS.80).aspx#datastructures20_4_topic4
http://igoro.com/archive/skip-lists-are-fascinating/
The above skip lists are implemented in C#, but can work out a c++ implementation using that code.

Just store 2 pointers. One called above, and one called below in your node class.
Not sure what you mean.
According to wikipedia you can also do a geometric distribution. I'm not sure if the type of distribution matters for totally random access, but it obviously matters if you know your access pattern.
I am unsure of what you mean by this. You can represent something like that with floating point numbers.

You're making "vertical" and "horizontal" too complicated. They are all just pointers. The little boxes you draw on paper with lines on them are just to help visualize something when thinking about them. You could call a pointer "elephant" and it would go to the next node if you wanted it to.
eg. a "next" and "prev" pointer are the exact same as a "above"/"below" pointer.
Anyway, good luck with your homework. I got the same homework once in my data structures class.

Related

"Guessing" what side of a doubly linked list to start on

A bit hard to explain what I'm planning, but here it goes. I have a doubly linked list of objects which are ordered alphabetically by a member attribute called name. I wish to remove a Node with a specific name, but I would like to remove it in such a way that it is more likely to start looking for it on the side of the list closer to it.
So I was thinking that I would have to find the 'midpoint' between the first Node's name and the last Node's name. Then I will check to see if that midpoint is less than the name of the Node. If it is less, I will start from the tail, otherwise I will start from the head.
The problem I am having is that I am unable to convert a string directly into an int. My potential solution is this:
Convert each individual character in the head and tail to an int
Put each conversion into an int array, one array for the head, one for the tail
Convert each int into a string again and put them into a new array
Make each converted string have a length of 3 by inserting 0s into them if they have less than a length of 3
Add the strings in each array together
Convert the strings to int again and find the difference between the two ints and divide that by 2
Add the new value to the first Node's converted name
Find if this 'midpoint' is less than the name of the Node I want to remove
If it is, start searching from the tail
Else, search from the head
Is there any easier way to go about doing this?
Alf's comment is realistically what you want. In order to decide which end to be on, you are getting maximum resolution by simply finding the first different character and then picking based on the midpoint.
Algorithm idea
list = ["apple", "banana", "orange"]
word_to_search_for = ["banana"]
index = 0
while list[0][index] == list[last][index]:
if word_to_search_for[index] != list[0][index]:
return "word not in list"
++index
spread = list[last][index] - list[0][index]
if (word_to_search_for[index] - list[0][index])> spread/2:
start at last
else:
start at 0
As others have already alluded, your main problem is that you're using the wrong data structure. Your question shouldn't be "How do I make a double linked list operate in a manner that is distinctly unlike a double linked list?", it should be "What is the best data structure for {insert your specific use case}?".
Reading between the lines, it appears that you're after something that allows for insertions, removals and relatively high speed scans. This leads me to suggest a Left Leaning Red Black Tree: see https://en.wikipedia.org/wiki/Left-leaning_red%E2%80%93black_tree
You could create an array of pointers to some sub-set of nodes in the list, like pointers to the first, middle, and last node of a list. You could use more pointers to reduce the search time, perhaps 4 to 16 pointers. Sort of a hierarchical overall structure. The array would need to be updated as nodes are deleted (at least the pointers to deleted nodes, pick the node before or after if this happens, or shrink the array). At some point, a tree like structure would be better.

Remove duplicates algorithm

I'm trying to write an algorithm to remove duplicates from a vector<struct xxxx*>.
struct xxxx{
int value; // This is just to make you understand
xxxx* one;
xxxx* two;
}
As you see my struct it's like a tree but the pointers are not in order. The pointers can point to any(actually not any but most) of the others. And the vector doesn't contain the structs but pointers, so I couldn't use the std algorithms to help me neither.
I'm trying to delete duplicates with exactly same value and the same two pointers, but in the same time if I have two similar structs (Let's say A and B) and C.one or C.two points to B. Then I need to change it to A and viceversa.
In other words: if A == B then remove B and change C.one to point A.
I think I can write the brute-force, so if there's no better algorithm I'll write it by myself.
Yesterday, I tried to explain the reasonable approach to a very similar problem to a coworker who had used an N squared solution to an N log N problem.
First create a helper struct, that is basically a wrapper around an xxxx* with a comparison operator checking the contents (not the pointer value) and probably with some other utility functions. This wrapper struct isn't strictly needed vs. just using xxxx*, but from experience, I think it makes the task cleaner.
Create a std::set of those helper structs, into which you will only insert unique elements, and likely another set into which you will insert recursively unresolved elements.
Loop through the original vector and at each position recurse through its children. If you hit a child already in the unique set, that is a final value for that child pointer. If you hit a child that matches a unique element without being the one it matches, then fix the pointer that got you there. If there is also the possibility of null pointers that should bottom the recursion, and if loops are possible you need to detect them (with that recursively unresolved set) and some decision about what to do with a loop. At some point you hit resolved unique elements and add that to the unique set.
The performance and maybe even soundness of the idea depends on the depth and complexity of the loops and what you want to do with loops. There are some messy cases where a loop would map onto another loop, but detecting that could be very tricky. If your phase "like a tree" meant "no loops" then the recursion bottoms cleanly and efficiently without the extra complexity of explicitly managing the recursively unresolved elements.
Obviously I left out some of the grunt work detail around detecting unique / non-unique as you back out of the recursion and around detecting "already did it during an earlier recursion" as you hit an item in the main loop above the recursion. But all those details should be pretty obvious as you write the relevant parts of the code.
Edit: To understand how few node visits there are despite nesting a recursion inside a sequential loop, think from the point of view of the pointers. We follow each pointer at most once (some duplicates are pre detected without following their pointers). For N nodes, there are N top level pointers (if I understood your description correctly) and significantly less than 2N internal pointers (the more tree-like it is, the closer it will be to N-1 internal pointers, rather than 2N). So each node is visited on average less than 3 times and a minority of those visits require both the pre lookup and the post recursion lookup, and each lookup is log U where U is the number of unique items found up to that point. So we can trivially see a bound of 6 N log N.

How do you implement a linked list using an array

Now, I know you must be telling to yourself, "Why the heck would anyone even do that?" But, it's something that will give us a really insightful knowledge about some primitive stuff. Kindly unleash your talent.
It is a valid question - college level data structure question. And so the answer can be found in many data structures books. http://books.google.co.in/books/about/Data_Structures_Using_C.html?id=X0Cd1Pr2W0gC
The wording of your question makes it seem that you are aware of the difference between linked lists and arrays. So I'm going to skip that part.
The main point to remember in the implementation is that while linked lists have pointers to the next element, in an array this will automatically be the next index. So, one way of implementing is to store all the data points of the linked list in the array. If you have to insert or delete an element, then you would first have to create a space in the array to place them at, or remove the extra space created. In a linked list you could have simply changed the pointers for one/two nodes and you would be done. However, we can't do that in an array since we can't manipulate the next pointers in the array. So, a simple idea is to shift every element to the left or right by one step depending upon your choice of operation. In case of insertion, insert that element in the space created by shifting right. In case of deletion, shift everything to the right of the element to be deleted to the left by one index. Note that this way every insertion and deletion will be O(n).
An idea avoid these repeated shifts in case of deletion could be to replace the element to be deleted by a pre-decided character, say ''. So, while traversing the array, a '' can be interpreted as an empty space. This will avoid left shifts in case of deletion. Also, when the array is full, we can traverse the entire array and remove all the '*' and shift the elements in one pass.
Take care to introduce checks about array bounds.

How to find largest values in C++ Map

My teacher in my Data Structures class gave us an assignment to read in a book and count how many words there are. Thats not all; we need to display the 100 most common words. My gut says to sort the map, but I only need 100 words from the map. After googling around, is there a "Textbook Answer" to sorting maps by the value and not the key?
I doubt there's a "Textbook Answer", and the answer is no: you can't sort maps by value.
You could always create another map using the values. However, this is not the most efficient solution. What I think would be better is for you to chuck the values into a priority_queue, and then pop the first 100 off.
Note that you don't need to store the words in the second data structure. You can store pointers or references to the word, or even a map::iterator.
Now, there's another approach you could consider. That is to maintain a running order of the top 100 candidates as you build your first map. That way there would be no need to do the second pass and build an extra structure which, as you pointed out, is wasteful.
To do this efficiently you would probably use a heap-like approach and do a bubble-up whenever you update a value. Since the word counts only ever increase, this would suit the heap very nicely. However, you would have a maintenance issue on your hands. That is: how you reference the position of a value in the heap, and keeping track of values that fall off the bottom.

Finding corruption in a linked list

I had an interview today for a developer position and was asked an interesting techincal question that i did not know the answer to. I will ask it here to see if anyone can provide me with a solution for my curiosity. It is a multi-part question:
1) You are given a singly linked list with 100 elements (integer and a pointer to next node), find a way to detect if there is a break or corruption halfway through the linked list? You may do anything with the linked list. Note that you must do this in the list as it is iterating and this is verification before you realise that the list has any issues with it.
Assuming that the break in the linked list is at the 50th element, the integer or even the pointer to the next node (51st element) may be pointing to a garbage value which is not necessarily an invalid address.
2) Note that if there is a corruption in the linked list, how would you minimize data loss?
To test for a "corrupted" integer, you would need to know what the range of valid values is. Otherwise, there is no way to determine that the value in any given (signed) integer is invalid. So, assuming you have a validity test for the int, you would always check that value before iterating to the next element.
Testing for a corrupted pointer is trickier - for a start, what you need to do is check the value of the pointer to the next element before you attempt to de-reference it, and ensure it is a valid heap address. That will avoid a segmentation fault. The next thing is to validate that what the pointer points at is in fact a valid linked list node element - that's a bit trickier? Perhaps de-reference the pointer into a list element class/struct, and test the validity of the int and "next" pointer, if they are also good, then can be pretty sure the previous node was good also.
On 2), having discovered a corrupted node, [if the next pointer is corrupted] what you should do is set the "next pointer" of the previous node to 'NULL' immediately, marking it as the end of the list, and log your error etc etc. if the corruption was just to the integer value, but not to the "next" element pointer, then you should remove that element from the list and link the previous and following nodes together instead - as no need to throw the rest of the list away in that case!
For the first part - Overload the new operator. When ever a new node is allocated allocate some additional space before and after the node and put some known values there. In traversal each node can be checked if it is in between the known values.
If you at design time know that corruption may become a critical issue, you could add a "magic value" as a field into the node data structure which allows you to identify whether some data is likely to be a node or not. Or even to run through memory searching for nodes.
Or double some link information, i.e. store the address of the node after the next node in each node, such that you can recover if one link is broken.
The only problem I see is that you have to avoid segmentation faults.
If you can do anything to the linked list, what you can do is to calculate the checksum of each element and store it on the element itself. This way you will be able to detect corruption even if it's a single bit error on the element.
To minimize data loss perhaps you can consider having storing the nextPtr in the previous element, that way if your current element is corrupted, you can always find the location of the next element from the previous.
This is an easy question, and there are several possible answers. Each trades off robustness with efficiency. Since increased robustness is a prerequisite of the question being asked, there are solutions available which sacrifice both time (list traversal speed, as well as speed of insertion and speed of deletion of nodes) or alternately space (extra info stored with each node). Now the problem has been stated that this is a fixed list of length 100, in which case the data structure of a linked list is most inappropriate. Why not make the puzzle a little more challenging and say that the size of the list is not known a priori?
Since the number of elements (100) is known, 100th node must contain a null pointer. If it does, the list with some good probability is valid (this cannot be guaranteed, if, for example, 99th node is corrupt and points to some memory location with all zeros). Otherwise, there is some problem (this can be returned as a fact).
upd: Also, it could be possible to, an every step, look at some structures delete would use if given the pointer, but since using delete itself is not safe in any sense, this is going to be implementation-specific.