AVL Insert and balance loop - c++

I am implementing AVL Trees in C++ on my own code but this problem is more about understanding AVL trees rather than the code itself. I am sorry if it doesn't fit here but I have crawled through the Internet and I have still not found a solution to my problem.
My code works as expected with relatively small inputs (~25-30 digits) so I suppose it should work for more. I am using an array in which I hold the nodes I have visited during Insertion and then using a while loop I am raising the heights of each node when needed, I know that this procedure has to end when I find a node whose heights are equal (their subtraction result is 0 that is).
The problem is when it comes to balancing. While I can find the Balance Factor of each node and balance the tree correctly I am not sure if I should stop adjusting the heights after balancing and just end the Insertion loop or keep going until the condition is meant, and I just can't figure it out now. I know that during deletion of a node and re-balancing the tree I should keep checking but I am not sure about Insertion and balancing.
Anyone can provide any insight to this and perhaps some documentation?

If you insert only one item at a time: Only one (single or double) rotation is needed to readjust an AVL tree after an insertion throws it out of balance. http://cis.stvincent.edu/html/tutorials/swd/avltrees/avltrees.html You can probably prove it by yourself after you know the conclusion.

Just for reference of future readers there is no need to edit the heights of the nodes above the node you balanced if you have implemented the binary tree like my example:
10
(1)/ \(2)
8 16
(1)/ \(0)
11
(Numbers in parenthesis are the height of each sub tree)
Supposing than on the tree above we insert a node with data=15 Then the resulting subtree is as following:
10
(1)/ \(2)
8 16
(1)/ \(0)
11
(0)/ \(1)
15
Notice how previous heights of sub trees are not yet edited. After a successful insertion we run back through the insertion path, in this case its (11, 16, 10). After running back through this path we edit the heights when needed. That means the left height of the sub tree of 16 will be 2 while it's right height of sub tree is 0 resulting in an imbalanced AVL tree. After balancing the tree with a double rotation the sub tree is:
15
(1)/ \(1)
11 16
So the subtree height is maximum 1, as it was before, therefore heights above the root of this subtree haven't altered and the function changing the heights must return now.

Related

Dijkstra by means of Heap and pred-field

I have the following task, but I have no idea how to do it?
Calculate (lexicographically) the shortest path from V1-node to the other nodes by means of Dijkstra. Please write the current Heaps and corresponding Pred-fields as well. Start with new Heap and pred-field before the ExtractMin-Call.
I got this result via Dijkstra, but how should I add it to the min-heap (tree)?
I found exactly this task in an old exam I use for learning algorithms. I wasn't sure how to solve it so I searched several books for any explanation on how the minheap works along with Dijkstra and didn't find any. Finally I figured it out. So I hope this helps to understand how to solve this problem.
Step 1: Draw the heap lexicographically: V1 -> root, and then the
other nodes from left to right. V1 has the value 0, all other nodes
inf.
Step 2: ExtractMin(): swap root (which has the lowest value)
with the very last node and cut it off.
Step 3: Relaxation: Update new values for the nodes, if value decreases. "What is the new value from start to X when
I walk via the node, I just extracted, respecting its predecessors? Is it less than the old value, then update."
Step 4: Update the predecessor field if there is a better path.
Step 5: Call Heapify(): Resort the heap, so that
the node with the lowest value becomes root.
Repeat Step 2 to 5 until there are no nodes left.

An efficient data structure for access by ID and finding the weighted random item

Could you, please, help me with the data structure that allows O(logN) (or at least O(sqrtN)) operations for the following:
Insert an item having ID (int64_t) and health (double)
Remove an item by ID
Find an item that is weighted random by health
The preferred language is C++ or C. By the weighted random I mean the following:
Consider totalHealth=Sum(health[0], health[1], ..., health[N-1]). I need a fast (as described above) operation that is equivalent to:
Compute const double atHealth = rand_uint64_t()*totalHealth/numeric_limits<uint64_t>::max();
Iterate over i=0 to N-1 to find the first i such that Sum(health[0], health[1], ..., health[i]) >= atHealth
Constraints: health[i] > 0, rand_uint64_t() returns a uniformly distributed integer value between 0 and numeric_limits<uint64_t>::max().
What I have tried so far is a C++ unordered_map that allows quick (Θ(1)) insertion by ID and removal by ID, but the operation #3 is still linear in N as described in my pseudo-code above.
You help is very appreciated!
I can't think of a way to do it with the existing STL containers but I can think of a way to do it if you're willing to code up your own binary tree. The trick is that each node maintains the total health of all the nodes to its left (it doesn't need to worry about nodes to its right as you'll see below). Then, if you walk the tree in ID order you can also compute the "cumulative health", also in ID order, in log(n) time. So the tree is sorted by both ID and cumulative health and you can do lookups in log(n) time either by ID or by "cumulative health". For example, consider a very simple tree like the following:
ID: 8
h: 10
chl: 15
+-------|--------+
| |
ID: 4 ID: 10
h: 15 h: 7
chl: 0 chl: 0
in the above h is the health of the node and chl is the cumulative health of all nodes to it's left. So the total health of all nodes in the above is 15 + 10 + 7 = 32 (and I assume you maintain that count separately though you could also track cumulative health of nodes the right and you wouldn't need to). Let's look at 3 cases:
You compute an atHealth < 15. Then at the first node you can see that your value is less than the chl so you know you need to go left and you end up at the correct leaf.
You compute an atHealth >= 15 < 25 so you know it's > 15 so you don't go left at the root, the node you're at has health 10 and 10 + 15 means the cumulative health at that node is between 15 and 25 so you're good.
You compute an atHealth >= 25. Every time you visit a node and go right you must add the chl and h of the node you were at to keep computing cumulative health as you walk the tree so you know you're starting at 10 + 25 = 25 when you go right and you'll add that to the h or chl of any node you encounter after that. Thus you can quickly find that the node to the right is the correct one.
When you insert a new node you increment the total health of each parent node as you walk the tree and when you remove a node you walk back up the tree subtracting from the total health. Inserts and deletions are thus still O(log(n)) and lookups by ID are also log(n) either by ID or by atHealth.
Things obviously get more complicated if you want to maintain a balanced tree but it's still do-able.

Partition in Equivalence sets

I have a simple, non-dirictional tree T. I should find a path named A and another named B that A and B have no common vertex. The perpuse is to maxmize the Len(A)*Len(B).
I figured this problem is similer to Partition Problem, except in Partition Problem you have a set but here you have a Equivalence set. The solution is to find two uncrossed path that Len(A) ~ Len(B) ~ [n-1/2]. Is this correnct? how should I impliment such algorithm?
First of all. I Think you are looking at this problem the wrong way. As I understand it you have a graph related problem. What you do is
Build a maximum spanning tree and find the length L.
Now, you say that the two paths can't have any vertex in common, so we have to remove an edge to archieve this. I assume that every edge wheight in your graph is 1. So sum of the two paths A and B are L-1 after you removed an edge. The problem is now that you have to remove an edge such that the product of len(a) and len(b) is maximized. You do that by removeing the edge in et most 'middel' of L. Why, the problem is of the same as optimizing the area of a rectangle with a fixed perimeter. A short youtube video on the subject can be found here.
Note if your edge wheights are not equal to 1, then you have a harder problem, because there may exist more than one maximum spanning tree. But you may be able to split them in different ways, if this is the case, write me back, then I will think about a solution, but i do not have one at hand.
Best of luck.
I think there is a dynamic programming solution that is just about tractable if path length is just the number of links in the paths (so links don't have weights).
Work from the leaves up. At each node you need to keep track of the best pair of solutions confined to the subtree with root at that node, and, for each k, the best solution with a path of length k terminating in that node and a second path of maximum length somewhere below that node and not touching the path.
Given this info for all descendants of a node, you can produce similar info for that node, and so work your way up to the route.
You can see that this amount of information is required if you consider a tree that is in fact just a line of nodes. The best solution for a line of nodes is to split it in two, so if you have only worked out the best solution when the line is of length 2n + 1 you won't have the building blocks you need for a line of length 2n + 3.

Faster alternatives to Dijkstra's algorithm for GPS system

I'm a real speed freak if it gets to algorithms, and in the plugins I made for a game.
The speed is.. a bit.. not satisfying. Especially while driving around with a car and you do not follow your path, the path has to be recalculated.. and it takes some time, So the in-game GPS is stacking up many "wrong way" signals (and stacking up the signals means more calculations afterward, for each wrong way move) because I want a fast, live-gps system which updates constantly.
I changed the old algorithm (some simple dijkstra implementation) to boost::dijkstra's to calculate a path from node A to node B
(total node list is around ~15k nodes with ~40k connections, for curious people here is the map: http://gz.pxf24.pl/downloads/prv2.jpg (12 MB), edges in the red lines are the nodes),
but it didn't really increase in speed. (At least not noticeably, maybe 50 ms).
The information that is stored in the Node array is:
The ID of the Node,
The position of the node,
All the connections to the node (and which way it is connected to the other nodes, TO, FROM, or BOTH)
Distance to the connected nodes.
I'm curious if anybody knows some faster alternatives in C/C++?
Any suggestions (+ code examples?) are appreciated!
If anyone is interested in the project, here it is (source+binaries):
https://gpb.googlecode.com/files/RouteConnector_177.zip
In this video you can see what the gps-system is like:
http://www.youtu.be/xsIhArstyU8
as you can see the red route is updating slowly (well, for us - gamers - it is slow).
( ByTheWay: the gaps between the red lines have been fixed a long time ago :p )
Since this is a GPS, it must have a fixed destination. Instead of computing the path from your current node to the destination each time you change the current node, you can instead find the shortest paths from your destination to all the nodes: just run Dijkstra once starting from the destination. This will take about as long as an update takes right now.
Then, in each node, keep prev = the node previous to this on the shortest path to this node (from your destination). You update this as you compute the shortest paths. Or you can use a prev[] array outside of the nodes - basically whatever method you are using to reconstruct the path now should still work.
When moving your car, your path is given by currentNode.prev -> currentNode.prev.prev -> ....
This will solve the update lag and keep your path optimal, but you'll still have a slight lag when entering your destination.
You should consider this approach even if you plan on using A* or other heuristics that do not always give the optimal answer, at least if you still get lag with those approaches.
For example, if you have this graph:
1 - 2 cost 3
1 - 3 cost 4
2 - 4 cost 1
3 - 4 cost 2
3 - 5 cost 5
The prev array would look like this (computed when you compute the distances d[]):
1 2 3 4 5
prev = 1 1 1 2 3
Meaning:
shortest path FROM TO
1 2 = prev[2], 2 = 1, 3
1 3 = prev[3], 3 = 1, 3
1 4 = prev[ prev[4] ], prev[4], 4 = 1, 2, 4 (fill in right to left)
1 5 = prev[ prev[5] ], prev[5], 5 = 1, 3, 5
etc.
To make the start instant, you can cheat in the following way.
Have a fairly small set of "major thoroughfare nodes". For each node define its "neighborhood" (all nodes within a certain distance). Store the shortest routes from every major thoroughfare node to every other one. Store the shortest routes to/from each node to its major thoroughfares.
If the two nodes are in the same neighborhood you can calculate the best answer on the fly. Else consider only routes of the form, "here to major thoroughfare node near me to major thoroughfare near it to it". Since you've already precalculated those, and have a limited number of combinations, you should very quickly be able to calculate a route on the fly.
Then the challenge becomes picking a set of major thoroughfare nodes. It should be a fairly small set of nodes that most good routes should go through - so pick a node every so often along major streets. A list of a couple of hundred should be more than good enough for your map.

O(log n) index update and search

I need to keep track of indexes in a large text file. I have been keeping a std::map of indexes and accompanying data as a quick hack. If the user is on character 230,400 in the text, I can display any meta-data for the text.
Now that my maps are getting larger, I'm hitting some speed issues (as expected).
For example, if the text is modified at the beginning, I need to increment the indexes after that position in the map by one, an O(N) operation.
What's a good way to change this to O(log N) complexity? I've been looking at AVL Arrays, which is close.
I'm hoping for O(log n) time for updates and searches. For example, if the user is on character 500,000 in the text array, I want to very quickly find if there is any meta data for that character.
(Forgot to add: The user can add meta data whenever they like)
Easy. Make a binary tree of offsets.
The value of any offset is computed by traversing the tree from the leaf to the root adding offsets any time a node is a right child.
Then if you add text early in the file you only need to update the offsets for nodes which are parents of the offsets that change. That is say you added text before the very first offset, you add the number of characters added to the root node. now one half of your offsets have been corrected. Now traverse to the left child and add the offset again. Now 3/4s of offsets have been updated. Continue traversing left children adding the offset until all the offsets are updated.
#OP:
Say you have a text buffer with 8 characters, and 4 offsets into the odd bytes:
the tree: 5
/ \
3 2
/ \ / \
1 0 0 0
sum of right
children (indices) 1 3 5 7
Now say you inserted 2 bytes at offset 4. Buffer was:
01234567
Now its
0123xx4567
So you modify just nodes that dominate parts of the array that changed. In this case just
the root node needs to be modified.
the tree: 7
/ \
3 2
/ \ / \
1 0 0 0
sum of right
children (indices) 1 3 7 9
The summation rule is walking from leaf to root I sum to myself, the value of my parent if I am that parent's right child.
To find if there is an index at my current location I start at the root and ask is this offset greater smaller than my location. If yes I traverse left and add nothing. If no I traverse right and add the value to my index. If at the end of traversal my value is equal to my index then yes there is an annotation. You can do a similar traverals with a minimum and maximum index to find the node that dominates all the indices in the range, finding all the indices to the text I'm displaying.
Oh.. and this is just a toy example. In reality you need to periodically rebalance the tree otherwise there is a chance that if you keep adding new indices just in one part of the file you will get a tree which is way out of balance, and worst case performance would no longer be O(log2 n) but would be O(n). To keep the tree balanced you would need to implement a balanced binary tree like a "red/black tree". That would guarantee O(log2 n) performance where N is the number of metadatas.
Don't store indices! There's no way to possibly do that and simultaneously have performance better than O(n) - add a character at the beginning of the array and you'll have to increment n - 1 indices, no way around it.
But if you store substring lengths instead, you'd only have to change one per level of tree structure, bringing you up to O(log n). My (untested) solution would be to use a Rope with metadata attached to the nodes - you might need to play around with that a bit, but I think it's a solid foundation.
Hope that helps!