Get nodes and group them based on specific inputs from user - c++

I have a binary tree.
Each node is a struct with the 2 values: width and length.
The input from the user is to group them based on one or both criteria(height, width).
For this grouping, only the leaf nodes are considered.
Consider the tree in the attached image.
If the user said group based on length, the result would be [8,5,7] as it has a length 20 and [9,6] as its length is 10.
If the user said group based on both length and width, then the output would be
[8,5] w=20,l=30 , [9,6] l=10,w=20 , [7] l=20,w=10 .
How I dealt with this was to traverse the tree twice and collect 2 'lists of lists' and process that to get a new 'lists of lists' that groups the unique l&w values together.
The user can optionally also give level as an input. If the level is given as an input, then the same steps happen at that level instead of at leaf nodes.
Is there a better way to do it than what I have mentioned?
PS. I am working on this code in C++.

Data:
Construct a list of objects with fields
length
width
level
You can create this "main" list by doing a level-order traversal.
and construct the following lists from the main list:
a length-level-sorted list : list that is sorted first by length and then level
a width-level-sorted list : list that is sorted first by width and then level
a length-width-level-sorted list : list that is sorted first by length and then width and then level
Queries ( log n ):
For length queries : do a binary search on length-level-sorted list and look for all the objects to the left of found index and right index and return the group - should be O(log(n))
For width queries : do the same on width-level-sorted list - should be O(log(n))
For length and width queries : do binary search on length-width-level-sorted list first by length and then by width - should be O(log(n))
For level option : if level is included in query in any of the above queries - just do binary search by level - should be O(log(n)) again.

Related

Prolog - split a list into lists of lists

The list to be split is a list of guests that attend a dinner. The dinner has three meals (appetizer, main, and dessert). The result of the type list of lists should be a list for each of the meals with the sub-lists showing the people who eat that meal together.
eg.:
Appetizer: [[Tick, Trick, Track],[Tic, Tac, Toe], [Jerry, Larry, Harry]]
Main: [[Tick, Tic, Jerry], [Trick, Tac, Larry],[Track, Toe, Harry]]
Dessert: [[Tick, Tac, Larry], [Trick, Toe, Harry], [Track, Tic, Jerry]]
The function I have to code:
make_dinner(?Starters, ?Main, ?Dessert, +List_of_Persons, +Group_size):-
Group size is the size of the groups (in the example above three)
I don't know how to approach this problem using prolog language. I can easily code this in Java, which I'm really good at. But the recursion and logic here are very confusing to me. There's no "return" argument, no "if" statement", just a bunch of commas :")
My idea was to:
find out how many groups I need (depends on group size and list length)
fill the appetizer list with the original order of the list. Count it off and split the list after every group size position to get the list of lists for an appetizer (the most intuitive way I think)
fill the first sub-list of the "main" list with the first element of the list, then the second element of the other sub-lists from the "appetizer" list. Fill the second sub-list with the second element of the first sub-list of "appetizer", then the third element of the next sub-lists in "appetizer", and so on and so forth.
If the number doesn't match (eg. list length 10 but group size 3), I will simply have one group that's smaller. No person eats twice with the same person.
Does my idea work in Prolog? Can I implement that logic?
I don't really know how to approach this task because I'm very new to Prolog. I will have the main function below, then I need a helper_function called get_number_of_groups to know how many elements go into each list. And I have one function to get the size of the list.
How do I make the starters, main, and dessert into a list of lists? I know about recursion but somehow I don't know where to go from here.
%make_rudi: main function
%return: starters, main, and dessert; each list of lists
make_rudi(Starters, Main, Dessert, List_of_Persons, Group_size):-
get_number_of_groups(List_of_Persons, Group_size, Number),
%get_number_of_groups(+List_of_Persons, +Group_size, ?Number)
%return: the number of groups I have for each meal
get_number_of_groups(List_of_Persons, Group_size, Number):-
Number is list_length(Xs, List_of_Persons)/Group_size.
%list_length(?Xs,+L)
%the length of the list
list_length(Xs,L) :-
list_length(Xs,0,L) .
list_length( \[\] , L , L ) .
list_length( \[\_|Xs\] , T , L ) :-
T1 is T+1 ,
list_length(Xs,T1,L).

Finding the max value of a list of tuples, (applying max to the second value of the tuple)

So I have a list of tuples which I created from zipping two lists like this:
zipped =list(zip(neighbors, cv_scores))
max(zipped) produces
(49, 0.63941769316909292) where 49 is the max value.
However, I'm interesting in finding the max value among the latter value of the tuple (the .63941).
How can I do that?
The problem is that Python compares tuples lexicographically so it orders on the first item and only if these are equivalent, it compares the second and so on.
You can however use the key= in the max(..) function, to compare on the second element:
max(zipped,key=lambda x:x[1])
Note 1: Note that you do not have to construct a list(..) if you are only interested in the maximum value. You can use
max(zip(neighbors,cv_scores),key=lambda x:x[1]).
Note 2: Finding the max(..) runs in O(n) (linear time) whereas sorting a list runs in O(n log n).
max(zipped)[1]
#returns second element of the tuple
This should solve your problem in case you want to sort your data
and find the maximum you can use itemgetter
from operator import itemgetter
zipped.sort(key=itemgetter(1), reverse = True)
print(zipped[0][1]) #for maximum

adding frequency elt to a list of list for each elt in python

I have a list of elements (list) wich if formatted like this:
3823,La Canebiere,LOCATION
3949,La Canebiere,LOCATION
3959,Phocaeans,LOCATION
3990,Paris,LOCATION
323,Paris,LOCATION
3222,Paris,LOCATION
Some location names (elt[1]) may appear two or more times in the list, but with a different elt[0](id number). What I'm trying to achieve is adding frequency of elt[1] in the list, adding this frequency to each elt and discarding noun (elt[1]) duplicates. For my example the new tab would be :
3823,La Canebiere,LOCATION, 2
3959,Phocaeans,LOCATION,1
3990,Paris,LOCATION,3
I tried a count method and dictionnary for counting frequency but I don't know how to create the new list that would maintain the original list (without duplicates) plus the frequency. I'm using python 3. Thank you in advance if you can help !

seach for a specific node in an RB-tree

Suppose I have a RB-tree consisting of numbers which correspond to people's age; and suppose each node also have a gender (female or male).
My question is, how to get a specific number from that tree using OS-SELECT and rank value? Specific number means, find 2nd youngest man in the tree.
Example: OS-SELECT(root, 2) which returns second youngest man of the tree.
The aim is not just finding the 2nd or third yougest node, the aim is to find 2nd youngest men or 2nd youngest women
Simply traverse the tree in-order and count the elements that satisfy the predicate. In this case the predicate would be "is male". Finding an element in a binary search tree allows us to end traversal early, which is not necessarily trivial to implement, so here is a simple pseudocode for the algorithm:
# return value is used to track how many matching nodes must be
# found until the k'th one is reached
int OS-SELECT(node, rank)
if not node # leaf reached?
return rank # Keep searching.
rank = OS-SELECT(node.left, rank)
if not rank # Node was found in left subtree.
return 0 # End early.
if predicate(node) # Test the predicate.
if not --rank # The node matches: There are less matches to go through.
visit(node) # Rank dropped to 0: Found it. Visit the node and
return 0 # end early.
return OS-SELECT(node.right, rank)

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

just reading this great paper and trying to implement this:
... We treat each individual
tree as a categorical feature that takes as value the
index of the leaf an instance ends up falling in. We use 1-
of-K coding of this type of features. For example, consider
the boosted tree model in Figure 1 with 2 subtrees, where
the first subtree has 3 leafs and the second 2 leafs. If an
instance ends up in leaf 2 in the first subtree and leaf 1 in
second subtree, the overall input to the linear classifier will
be the binary vector [0, 1, 0, 1, 0], where the first 3 entries
correspond to the leaves of the first subtree and last 2 to
those of the second subtree ...
Anyone know how I can predict a bunch of rows and for each of those rows get the selected leaf for each tree in the ensemble? For this use case I don't really care what the node represents, just its index really. Had a look at the source and I could not quickly see anything obvious. I can see that I need to iterate the trees and do something like this:
for sample in X_test:
for tree in gbc.estimators_:
leaf = tree.leaf_index(sample) # This is the function I need but don't think exists.
...
Any pointers appreciated.
The following function goes beyond identifying the selected leaf from the Decision Tree and implements the application in the referenced paper. Its use is the same as the referenced paper, where I use the GBC for feature engineering.
def makeTreeBins(gbc, X):
'''
Takes in a GradientBoostingClassifier object (gbc) and a data frame (X).
Returns a numpy array of dim (rows(X), num_estimators), where each row represents the set of terminal nodes
that the record X[i] falls into across all estimators in the GBC.
Note, each tree produces 2^max_depth terminal nodes. I append a prefix to the terminal node id in each incremental
estimator so that I can use these as feature ids in other classifiers.
'''
for i, dt_i in enumerate(gbc.estimators_):
prefix = (i + 2)*100 #Must be an integer
nds = prefix + dt_i[0].tree_.apply(np.array(X).astype(np.float32))
if i == 0:
nd_mat = nds.reshape(len(nds), 1)
else:
nd_mat = np.hstack((nd, nds.reshape(len(nds), 1)))
return nd_mat
DecisionTreeRegressor has tree_ property which gives you access to the underlying decision tree. It has method apply, which seemingly finds corresponding leaf id:
dt.tree_.apply(X)
Note that apply expects its input to have type float32.