How to search in a Range Tree? - c++

I read several slides, like this one's last page, where the describe the search algorithm. However, I have a basic question. The data lie in a 2D space.
I first build a Binary Search Tree based on the x value of the points. Every inner node holds a BST based on the y value of the points that lie in the subtree of that inner node.
Then I think I should search for the points that lie in the range query [x1, x2] and then check if for that points the requested [y1, y2] range query is satisfied. However, the algorithm suggests that you should search in the y-based BST of an inner node, if the range of the inner node is inside [x1, x2], but I don't get that.
If we do that, then in an example I have, we will search (without a reason) the y-based BST of the root. Check the example:
------ 0 ---------------------
| |
---- -3 ---- ---- 4 ------
| | | |
---- -4 - -2 --- 3 --- 5
| | / \ | | / \
-5 (-3,4) (-2,2)(0,7) 2 (4,-4) (5,3)(6,-1)
/ \ / \
(-5,6) (-4,0) (2,1) (3,6)
And the range query I wish to perform is (-oo, 1) x (0, 5)*.
If I look at the root, it has value 0, thus it's enclosed in (-oo, 1), so if I follow the algorithm I am going to search the whole y-based tree of the root?
That should be a tree that contains all the points, so there is no point in continuing searching in x-based tree. Moreover, that will result in more visited nodes than the necessary.
I am implementing that in c++, if that matters.
*Performing a range query for x's in the range [-inf, 1] and y's in the range [0, 5].

The algorithm you are proposing is not quite right - you should compare the range you are querying with the range of the node you are looking at, not the value of the node.
E.g., initially you should compare (-inf, 1) with (-5, 6), which is the data range of the tree (you can also use (-inf, inf) as the data range of the tree or any interval that encloses (-5, 6), for that matter), instead of the value 0. Recursively you should compare the query range with the range of the subtree rooted at the node you are querying at.
Also, the range update can be done while searching - when splitting at a node, the upper/lower bound of the left/right recursive call interval is the node value.

Related

Get nodes and group them based on specific inputs from user

I have a binary tree.
Each node is a struct with the 2 values: width and length.
The input from the user is to group them based on one or both criteria(height, width).
For this grouping, only the leaf nodes are considered.
Consider the tree in the attached image.
If the user said group based on length, the result would be [8,5,7] as it has a length 20 and [9,6] as its length is 10.
If the user said group based on both length and width, then the output would be
[8,5] w=20,l=30 , [9,6] l=10,w=20 , [7] l=20,w=10 .
How I dealt with this was to traverse the tree twice and collect 2 'lists of lists' and process that to get a new 'lists of lists' that groups the unique l&w values together.
The user can optionally also give level as an input. If the level is given as an input, then the same steps happen at that level instead of at leaf nodes.
Is there a better way to do it than what I have mentioned?
PS. I am working on this code in C++.
Data:
Construct a list of objects with fields
length
width
level
You can create this "main" list by doing a level-order traversal.
and construct the following lists from the main list:
a length-level-sorted list : list that is sorted first by length and then level
a width-level-sorted list : list that is sorted first by width and then level
a length-width-level-sorted list : list that is sorted first by length and then width and then level
Queries ( log n ):
For length queries : do a binary search on length-level-sorted list and look for all the objects to the left of found index and right index and return the group - should be O(log(n))
For width queries : do the same on width-level-sorted list - should be O(log(n))
For length and width queries : do binary search on length-width-level-sorted list first by length and then by width - should be O(log(n))
For level option : if level is included in query in any of the above queries - just do binary search by level - should be O(log(n)) again.

How do I find the index of the maximum value in the list using Applescript?

I have an int list such as {18, 18, 18, 18, 22, 21}, I want to use Applescript to get the maximum value of this list, and get the maximum index, please teach me
There are two stages to this:
Identifying the maximum value in the list;
Once the maximum value is known, determining the index of the last occurrence of this value in the list.
I'll use a sample list I generated myself in the examples I demonstrate below. However, you can simply substitute your list for mine, and the processes described will work just as well, and produce the results specific to your input.
1. Retrieving the maximum value in a numerical list
A quick-and-dirty way to get the maxium value in the list is to use a bash numeric sort command, and pick the last item:
set L to {4, 24, 78, 32, 1.5, 32, 78, 4, 19, 78}
set text item delimiters to linefeed
do shell script "sort -n <<<" & quoted form of (L as text) & "| tail -n 1"
--> 78
But, in the spirit of problem solving, the computer scientist's approach would be to iterate through the items in the list and perform these operations:
Store the first item's value.
If the next item is of greater value, then replace the currently stored value with the item we just assessed as being greater in value.
If the next item is not of greater value, retain the currently stored value.
Once you reach the end of the list, the stored value must be equal to the greatest value item in the list. At this point, we don't know its position in the list, but we know its value.
Here's the AppleScript that performs this process:
set L to {4, 24, 78, 32, 1.5, 32, 78, 4, 19, 78}
set max to L's first item
repeat with x in L
if x > max then set max to x's contents
end repeat
return max
--> 78
2. Determining the index of a given item in a list
Putting aside the maximum value for now, the second half of the problem involves being able to determine the position of any given item in an ordered list.
The most obvious solution to this is, as before, iterating through each item in the list and performing this operation:
If the current item is equal to the target item, then append its index to the end of a list reserved for storing matching indices.
Once you reach the end of the list, your matched indices list will contain all the positions of the items whose value equal your target item's value; or the matched indices list will be an empty list, indicating that the main list does not contain the value we sought out.
The index of the first item in an AppleScript list is 1. Use the length property of a list to obtain the number of items in the whole list.
Here's a basic AppleScript:
set L to {4, 24, 78, 32, 1.5, 32, 78, 4, 19, 78}
set matches to {}
set target to 78
repeat with i from 1 to L's length
if item i of L = the target then set end of matches to i
end repeat
return the matches
--> {3, 7, 10}
3. The combined process
Combining these two halves of the problem is as simple as running each half in sequence, being mindful to use the result from the first half of the process—the maximum value—as the target value to be sought out in the list:
set L to {4, 24, 78, 32, 1.5, 32, 78, 4, 19, 78}
set max to L's first item
# Get maximum value
repeat with x in L
if x > max then set max to x's contents
end repeat
set matches to {}
set target to max
# Get index of maximum value
repeat with i from 1 to L's length
if item i of L = the target then set end of matches to i
end repeat
return the matches
--> {3, 7, 10}
Finally, as you only want the maximum index, this is simply the last value in the matches list, i.e. 10, which you obtain by replacing return the matches with this line:
return the last item in matches
--> 10
4. Efficiency improvements
Having outlined the basic methods in each process, these aren't necessarily the fastest methods. With lists containing only 10 items, inefficiency is not a noticeable concern. If you had a list of 10,000 items, you would want to be able to reduce the time to get your result.
I think I'm correct in stating that there's no discernible way to speed up the first process in terms of algorithmic improvements: retrieving the maximum value necessitates comparing every item's magnitude and retaining the largest.
Determining the index, however, can be sped up given that we only need to determine the last occurrence of an item in the list.
Therefore, we can run the process as before, but making two changes:
Start from the end of the list instead the beginning.
Stop the process once we find the first match.
Here's the script:
set L to {4, 24, 78, 32, 1.5, 32, 78, 4, 19, 78}
set max to L's first item
# Get maximum value
repeat with x in L
if x > max then set max to x's contents
end repeat
set target to max
# Get index of maximum value
repeat with i from L's length to 1 by -1
if item i of L = the target then exit repeat
end repeat
return i
--> 10
Note here the second repeat loop now runs backwards from the highest index down to 1; and, we no longer require the matches list, so instead we simply exit the loop when a match is found, and see what value of i we were at.
One further improvement to the algorithm here would be to test for the null case to determine whether or not we really need to run through the list at all: the null case is the case where the list doesn't contain the value we seek. AppleScript provides a builtin way to check this:
if the target is not in L then return 0
and this line would sit immediately after set target to max and immediately before repeat with i....
5. Advanced improvements
Another way to improve efficiency, after addressing efficiency of the algorithms themselves, is to address the efficiency of the way the script is written. This is more advanced than you need to concern yourself with now, but here's how I'd probably implement the algorithms if I were writing the script for my own use:
I would define a handler called maximum() that takes a list as its argument and returns the greatest value, and I would implement the scripting of this handler like so:
on maximum(L as list)
local L
if L is {} then return {}
if L's length = 1 then return L's first item
script Array
property x0 : L's first item
property xN : rest of L
property fn : maximum(xN)
property predicate : x0 > fn
end script
tell the Array
if its predicate is true then return its x0
its fn
end tell
end maximum
This uses something called a script object to process the items of the list, which is much, much quicker in AppleScript than conventional iterative repeat loop.
Next, I would define a second handler called lastIndexOf() that takes a supplied value and a list as its two arguments, and returns the highest index at which the supplied value occurs in the given list. My handler would look like this:
on lastIndexOf(x, L as list)
local x, L
if x is not in L then return 0
if L = {} then return
script Array
property x0 : L's last item
property xN : reverse of rest of reverse of L
property predicate : x0 ≠ x
end script
# For the last match only:
if Array's predicate is false then return (Array's xN's length) + 1
# For a match list (comment out line above):
tell the Array
if its predicate is false then ¬
return the lastIndexOf(x, its xN) ¬
& (its xN's length) + 1
return lastIndexOf(x, its xN)
end tell
end lastIndexOf
Then, all I need to do to obtain the result is:
set L to {4, 24, 78, 32, 1.5, 32, 78, 14, 19, 78}
get the lastIndexOf(maximum(L), L)
--> 10
But, don't try and understand what I've done here just yet, and concentrate on understanding the repeat loop algorithms.
I've included these more advanced versions for completeness and for readers who may have wondered, had I left this out, why I didn't provide the most optimal solution I could have.
Whilst the algorithm used in these advanced versions remains the same (it doesn't look like it, but it is), the way the code is written makes these incredibly efficient for large itemed lists.
Note, however, I haven't included any error handling, so if you were to pass those handlers a list that contained non-numerical items, at least one of them would complain.
The maximum value can be determined quite easy with help of AppleScriptObjC and Key-Value Coding
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
set numberList to {18, 18, 18, 18, 22, 21, 22}
set nsNumberList to current application's NSArray's arrayWithArray:numberList
set maxValue to nsNumberList's valueForKeyPath:"#max.intValue") as integer -- 22
If there is only one occurrence of that value of if you want only the index of the first occurrence write
set maxIndex to ((nsNumberList's indexOfObject:maxValue) as integer) + 1 -- AppleScript's lists are 1-based
If there are multiple occurrences of that value and you need all indexes use a loop (unfortunately the efficient native Cocoa API indexesOfObjectsPassingTest: is not available in AppleScriptObjC)
set maxIndexes to {}
repeat with i from 0 to (count numberList) - 1 -- Cocoa's lists are 0-based
if (maxValue's isEqualToNumber:(nsNumberList's objectAtIndex:i)) then
set end of maxIndexes to i + 1
end if
end repeat
maxIndexes -- {5, 7}

How to find the left and right children of root in Leonardo heap?

According to this article,
The root of the tree is at position L(k) - 1.
The root of the Ltk-1 subtree is at position L(k - 1) - 1.
The root of the Ltk-2 subtree is at position L(k) - 2.
Can someone please help me understand this?? I am trying to implement smoothsort.
Let's imagine you have a Leonardo heap of order k stored in an array, and you recursively do the layout so that you always lay out the bigger subtree, then the smaller subtree, then the root node. That means that there's going to be a total of L(k) nodes in the array, in positions numbered 0, 1, 2, 3..., L(k) - 1. It's going to look something like this:
+---------------------------+----------------------+------+
| Tree of order k - 1 | Tree of order k - 2 | root |
+---------------------------+----------------------+------+
Notice that the root comes last, so it's at position L(k) - 1 because we're using zero-indexing.
So where are the two subtrees? Well, the subtree of order k - 2 is immediately before the root node. It's laid out in a way where its root is to the far right. So to find its root, we go to the root of the whole tree (position L(k) - 1) and back up one step to get to position L(k) - 2.
What about the subtree of order k - 1? Well, notice that it's comfortably at the front of our representation. Its root node is going to be at the end of that block, which is at position L(k - 1) - 1 (analogously to how our larger tree of order k has its root at position L(k) - 1.)
Hope you enjoyed my article! :-)

Prolog List Neighbour of a Element

I am having problems with list of prolog. I want to make this:
[1,2,3,4,5]
[5,6,9,12,10]
You take a number for example 3, and you do a plus operation with the neighbours so the operation is 2+3+4 = 9. For the first and the last element you pretend there is an imaginary 1 there.
I have this now:
sum_list([A,X,B|T], [Xs|Ts]):-
add(A,X,B,Xs),
sum_list([X,B|T], Ts).
I haven't consider the first and the last element. My problem is I don't know how to get the element before and the next and then how to move on.
Note: I not allow to use meta-predicates.
Thanks.
I'm not sure how you calculated the first 5. The last 10 would be 4 + 5 + implicit 1. But following that calculation, the first element of your result should be 4 instead of 5?
Anyways, that doesn't really matter in terms of writing this code. You are actually close to your desired result. There are of course multiple ways of tackling this problem, but I think the simplest one would be to write a small 'initial' case in which you already calculate the first sum and afterwards recursively calculate all of the other sums. We can then write a case in which only 2 elements are left to calculate the last 'special' sum:
% Initial case for easily distinguishing the first sum
initial([X,Y|T],[Sum|R]) :-
Sum is X+Y+1,
others([X,Y|T],R).
% Match on 2 last elements left
others([X,Y],[Sum|[]]) :-
Sum is X+Y+1.
% Recursively keep adding neighbours
others([X,Y,Z|T],[Sum|R]) :-
Sum is X+Y+Z,
others([Y,Z|T],R).
Execution:
?- initial([1,2],Result)
Result = [4,4]
?- initial([1,2,3,4,5],Result)
Result = [4, 6, 9, 12, 10]
Note that we now don't have any cases (yet) for an empty list or a list with just one element in it. This still needs to be covered if necessary.

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

just reading this great paper and trying to implement this:
... We treat each individual
tree as a categorical feature that takes as value the
index of the leaf an instance ends up falling in. We use 1-
of-K coding of this type of features. For example, consider
the boosted tree model in Figure 1 with 2 subtrees, where
the first subtree has 3 leafs and the second 2 leafs. If an
instance ends up in leaf 2 in the first subtree and leaf 1 in
second subtree, the overall input to the linear classifier will
be the binary vector [0, 1, 0, 1, 0], where the first 3 entries
correspond to the leaves of the first subtree and last 2 to
those of the second subtree ...
Anyone know how I can predict a bunch of rows and for each of those rows get the selected leaf for each tree in the ensemble? For this use case I don't really care what the node represents, just its index really. Had a look at the source and I could not quickly see anything obvious. I can see that I need to iterate the trees and do something like this:
for sample in X_test:
for tree in gbc.estimators_:
leaf = tree.leaf_index(sample) # This is the function I need but don't think exists.
...
Any pointers appreciated.
The following function goes beyond identifying the selected leaf from the Decision Tree and implements the application in the referenced paper. Its use is the same as the referenced paper, where I use the GBC for feature engineering.
def makeTreeBins(gbc, X):
'''
Takes in a GradientBoostingClassifier object (gbc) and a data frame (X).
Returns a numpy array of dim (rows(X), num_estimators), where each row represents the set of terminal nodes
that the record X[i] falls into across all estimators in the GBC.
Note, each tree produces 2^max_depth terminal nodes. I append a prefix to the terminal node id in each incremental
estimator so that I can use these as feature ids in other classifiers.
'''
for i, dt_i in enumerate(gbc.estimators_):
prefix = (i + 2)*100 #Must be an integer
nds = prefix + dt_i[0].tree_.apply(np.array(X).astype(np.float32))
if i == 0:
nd_mat = nds.reshape(len(nds), 1)
else:
nd_mat = np.hstack((nd, nds.reshape(len(nds), 1)))
return nd_mat
DecisionTreeRegressor has tree_ property which gives you access to the underlying decision tree. It has method apply, which seemingly finds corresponding leaf id:
dt.tree_.apply(X)
Note that apply expects its input to have type float32.