seach for a specific node in an RB-tree - c++

Suppose I have a RB-tree consisting of numbers which correspond to people's age; and suppose each node also have a gender (female or male).
My question is, how to get a specific number from that tree using OS-SELECT and rank value? Specific number means, find 2nd youngest man in the tree.
Example: OS-SELECT(root, 2) which returns second youngest man of the tree.
The aim is not just finding the 2nd or third yougest node, the aim is to find 2nd youngest men or 2nd youngest women

Simply traverse the tree in-order and count the elements that satisfy the predicate. In this case the predicate would be "is male". Finding an element in a binary search tree allows us to end traversal early, which is not necessarily trivial to implement, so here is a simple pseudocode for the algorithm:
# return value is used to track how many matching nodes must be
# found until the k'th one is reached
int OS-SELECT(node, rank)
if not node # leaf reached?
return rank # Keep searching.
rank = OS-SELECT(node.left, rank)
if not rank # Node was found in left subtree.
return 0 # End early.
if predicate(node) # Test the predicate.
if not --rank # The node matches: There are less matches to go through.
visit(node) # Rank dropped to 0: Found it. Visit the node and
return 0 # end early.
return OS-SELECT(node.right, rank)

Related

Prolog - split a list into lists of lists

The list to be split is a list of guests that attend a dinner. The dinner has three meals (appetizer, main, and dessert). The result of the type list of lists should be a list for each of the meals with the sub-lists showing the people who eat that meal together.
eg.:
Appetizer: [[Tick, Trick, Track],[Tic, Tac, Toe], [Jerry, Larry, Harry]]
Main: [[Tick, Tic, Jerry], [Trick, Tac, Larry],[Track, Toe, Harry]]
Dessert: [[Tick, Tac, Larry], [Trick, Toe, Harry], [Track, Tic, Jerry]]
The function I have to code:
make_dinner(?Starters, ?Main, ?Dessert, +List_of_Persons, +Group_size):-
Group size is the size of the groups (in the example above three)
I don't know how to approach this problem using prolog language. I can easily code this in Java, which I'm really good at. But the recursion and logic here are very confusing to me. There's no "return" argument, no "if" statement", just a bunch of commas :")
My idea was to:
find out how many groups I need (depends on group size and list length)
fill the appetizer list with the original order of the list. Count it off and split the list after every group size position to get the list of lists for an appetizer (the most intuitive way I think)
fill the first sub-list of the "main" list with the first element of the list, then the second element of the other sub-lists from the "appetizer" list. Fill the second sub-list with the second element of the first sub-list of "appetizer", then the third element of the next sub-lists in "appetizer", and so on and so forth.
If the number doesn't match (eg. list length 10 but group size 3), I will simply have one group that's smaller. No person eats twice with the same person.
Does my idea work in Prolog? Can I implement that logic?
I don't really know how to approach this task because I'm very new to Prolog. I will have the main function below, then I need a helper_function called get_number_of_groups to know how many elements go into each list. And I have one function to get the size of the list.
How do I make the starters, main, and dessert into a list of lists? I know about recursion but somehow I don't know where to go from here.
%make_rudi: main function
%return: starters, main, and dessert; each list of lists
make_rudi(Starters, Main, Dessert, List_of_Persons, Group_size):-
get_number_of_groups(List_of_Persons, Group_size, Number),
%get_number_of_groups(+List_of_Persons, +Group_size, ?Number)
%return: the number of groups I have for each meal
get_number_of_groups(List_of_Persons, Group_size, Number):-
Number is list_length(Xs, List_of_Persons)/Group_size.
%list_length(?Xs,+L)
%the length of the list
list_length(Xs,L) :-
list_length(Xs,0,L) .
list_length( \[\] , L , L ) .
list_length( \[\_|Xs\] , T , L ) :-
T1 is T+1 ,
list_length(Xs,T1,L).

Get nodes and group them based on specific inputs from user

I have a binary tree.
Each node is a struct with the 2 values: width and length.
The input from the user is to group them based on one or both criteria(height, width).
For this grouping, only the leaf nodes are considered.
Consider the tree in the attached image.
If the user said group based on length, the result would be [8,5,7] as it has a length 20 and [9,6] as its length is 10.
If the user said group based on both length and width, then the output would be
[8,5] w=20,l=30 , [9,6] l=10,w=20 , [7] l=20,w=10 .
How I dealt with this was to traverse the tree twice and collect 2 'lists of lists' and process that to get a new 'lists of lists' that groups the unique l&w values together.
The user can optionally also give level as an input. If the level is given as an input, then the same steps happen at that level instead of at leaf nodes.
Is there a better way to do it than what I have mentioned?
PS. I am working on this code in C++.
Data:
Construct a list of objects with fields
length
width
level
You can create this "main" list by doing a level-order traversal.
and construct the following lists from the main list:
a length-level-sorted list : list that is sorted first by length and then level
a width-level-sorted list : list that is sorted first by width and then level
a length-width-level-sorted list : list that is sorted first by length and then width and then level
Queries ( log n ):
For length queries : do a binary search on length-level-sorted list and look for all the objects to the left of found index and right index and return the group - should be O(log(n))
For width queries : do the same on width-level-sorted list - should be O(log(n))
For length and width queries : do binary search on length-width-level-sorted list first by length and then by width - should be O(log(n))
For level option : if level is included in query in any of the above queries - just do binary search by level - should be O(log(n)) again.

Attempting to splice a recurring item out of a list

I have extracted files from an online database that consist of a roughly 100 titles. Associated with each of these titles is a DOI number, however, the DOI number is different for each title. To program this endeavor, I converted the contents of the website to a list. I then created for loop to iterate through each item of the list. What I want the program to do is iterate through the entire list and find where it says "DOI:" then to take the number which follows this. However, with the for loop I created, all it seems to do is print out the first DOI number, then terminates. How to I make the loop keep going once I have found the first one.
Here is the code:
resulttext = resulttext.split()
print(resulttext)
for item in resulttext:
if item == "DOI:":
DOI=resulttext[resulttext.index("DOI:")+1] #This parses out the DOI, then takes the item which follows it
print(DOI)

Pre-order exploration of tictactoe search space not generating all states

I am trying to implement q-learning for tictactoe. One of the steps in doing so involves enumerating all the possible states of the tictactoe board to form a state-value table. I have written a procedure to recursively generate all possible states starting from the empty board. To do this, I am performing implicitly the pre-order traversal of the search space tree. However, at the end of it all, I am getting only 707 unique states whereas the general consensus is that the number of legal states is around 5000.
Note: I am referring to the number of legal states. I am aware that the number of states is closer to 19,000 if either player is allowed to continue playing after a game is completed (what I mean by an illegal state).
CODE:
def generate_state_value_table(self, state, turn):
winner = int(is_game_over(state)) #check if, for the current turn and state, game has finished and if so who won
#print "\nWinner is ", winner
#print "\nBoard at turn: ", turn
#print_board(state)
self.add_state(state, winner/2 + 0.5) #add the current state with the appropriate value to the state table
open_cells = open_spots(state) #find the index (from 0 to total no. of cells) of all the empty cells in the board
#check if there are any empty cells in the board
if len(open_cells) > 0:
for cell in open_cells:
#pdb.set_trace()
row, col = cell / len(state), cell % len(state)
new_state = deepcopy(state) #make a copy of the current state
#check which player's turn it is
if turn % 2 == 0:
new_state[row][col] = 1
else:
new_state[row][col] = -1
#using a try block because recursive depth may be exceeded
try:
#check if the new state has not been generated somewhere else in the search tree
if not self.check_duplicates(new_state):
self.generate_state_value_table(new_state, turn+1)
else:
return
except:
#print "Recursive depth exceeded"
exit()
else:
return
You can look at the full code here if you want.
EDIT:
I tidied the code up a bit both in the link and here with more comments to make things clearer. Hope that helps.
So I finally solved the issue and I am putting up this answer for anyone who faces a similar issue. The bug was in the way I was handling duplicate states. If the new state generated was generated before somewhere else in the search tree, then it should not be added to the state table, but the mistake I made was in cutting short the pre-order traversal on finding a duplicate state when it should have gone one.
Simply put: removing the else clause from the code below gave me the number of states as 6046:
#check if the new state has not been generated somewhere else in the search tree
if not self.check_duplicates(new_state):
self.generate_state_value_table(new_state, turn+1)
else:
return
Furthermore, I stopped exploring the search tree branch when I encountered a state where there was a clear winner. Concretely, I added the following code after self.add_state(state, winner/2 + 0.5):
#check if the winner returned is one of the players and go back to the previous state if so
if winner != 0:
return
This gave me the number of states as 5762 which is what I was looking for.

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

just reading this great paper and trying to implement this:
... We treat each individual
tree as a categorical feature that takes as value the
index of the leaf an instance ends up falling in. We use 1-
of-K coding of this type of features. For example, consider
the boosted tree model in Figure 1 with 2 subtrees, where
the first subtree has 3 leafs and the second 2 leafs. If an
instance ends up in leaf 2 in the first subtree and leaf 1 in
second subtree, the overall input to the linear classifier will
be the binary vector [0, 1, 0, 1, 0], where the first 3 entries
correspond to the leaves of the first subtree and last 2 to
those of the second subtree ...
Anyone know how I can predict a bunch of rows and for each of those rows get the selected leaf for each tree in the ensemble? For this use case I don't really care what the node represents, just its index really. Had a look at the source and I could not quickly see anything obvious. I can see that I need to iterate the trees and do something like this:
for sample in X_test:
for tree in gbc.estimators_:
leaf = tree.leaf_index(sample) # This is the function I need but don't think exists.
...
Any pointers appreciated.
The following function goes beyond identifying the selected leaf from the Decision Tree and implements the application in the referenced paper. Its use is the same as the referenced paper, where I use the GBC for feature engineering.
def makeTreeBins(gbc, X):
'''
Takes in a GradientBoostingClassifier object (gbc) and a data frame (X).
Returns a numpy array of dim (rows(X), num_estimators), where each row represents the set of terminal nodes
that the record X[i] falls into across all estimators in the GBC.
Note, each tree produces 2^max_depth terminal nodes. I append a prefix to the terminal node id in each incremental
estimator so that I can use these as feature ids in other classifiers.
'''
for i, dt_i in enumerate(gbc.estimators_):
prefix = (i + 2)*100 #Must be an integer
nds = prefix + dt_i[0].tree_.apply(np.array(X).astype(np.float32))
if i == 0:
nd_mat = nds.reshape(len(nds), 1)
else:
nd_mat = np.hstack((nd, nds.reshape(len(nds), 1)))
return nd_mat
DecisionTreeRegressor has tree_ property which gives you access to the underlying decision tree. It has method apply, which seemingly finds corresponding leaf id:
dt.tree_.apply(X)
Note that apply expects its input to have type float32.