Creating a reduce function that ignores children of a node - mapreduce

I have a set of documents in couch DB that act as a tree. Each document includes an ancestors field that includes a list of IDs that represent the location in the tree of the current document. Some of the documents have a value associated with them (a number).
When a document does not have it's own value, it's implicit value is the maximum of all of it's children. But when a document does have it's own value, the value of any children is ignored. For example
Root (no value, no ancestors) effective value 7
Child1 (value 5, ancestors= Root) effective value 5
Grandchild1 (value 9, ancestors=Root,Child1) effective value 9
Child2 (no value, ancestors= Root) effective value 7
Grandchild2 (value 3, ancestors=Root,Child2) effective value 3
Grandchild3 (value 7, ancestors=Root,Child2) effective value 7
Child3 (value 6, ancestors= Root) effective value 6
I'm trying to build a view that allows me to get the effective value for a given node (I will have the full path so could key by path or id). So what I'm trying doing is:
Emit anything that has an explicit value in my map function
Build a reduce function that returns the explicit value for a node if it has one or finds the maximum of it's children if it doesn't.
I'm struggling to make this work because reduce appears to be run on partitions of the data. So I keep getting scenarios where the child and parent don't come together until a re-reduce so I lose the relationships and get the wrong value. e.g. with the above tree:
Grandchild 1 (value 9) and Grandchild 3 (value 7) end up on the same partition so are reduced first, resulting in value 9
Child 1 (value 5) is reduced with Child 3 (value 6) resulting in 6 being the winning value and dropping any knowledge of Child 1
The re-reduce happens to reduce Child 3 (value 6) and Grandchild 1 (value 9) and the resulting value is 9, because Child 1 is not involved in this reduce we don't know that the 9 should actually be replaced by a 5 so we get the wrong result.
Is it possible to implement this logic in a couch view reduce function?
My current implementations are:
Map Function
function (doc) {
if (doc.ancestors && doc.data) {
// For anything with a value emit it with the full path as the key
var path = doc.ancestors.concat([doc._id]);
emit(path, {data: doc.data, path: path})
}
}
Reduce Function
function (keys, values, rereduce) {
// build a tree of the values currently being reduced
// by reassembling the paths and putting the value
// at the targeted location
var tree = {children: {}};
function addValueToTree(value) {
var parent = tree;
value.path.forEach(function(docId) {
var newParent = parent.children[docId] || {children: {}};
parent.children[docId] = newParent;
parent = newParent;
});
parent.value = value;
}
values.forEach(addValueToTree);
// Object.entries isn't supported in Couch JS
// engine so simple polyfill
function entries(obj) {
var entries = [];
for (k in obj) {
entries.push([k, obj[k]]);
}
return entries;
}
// Determine the value of the tree selecting
// the one "winning" value to return from the
// reduce function
function reduceTree(tree, path) {
// If the node of the tree we're reducing has an
// explicit value then return that since it should
// take precedence over any children
if (tree.value) {
return tree.value;
}
// Otherwise reduce the children of the current node
// to get the "winning" node from
var toCombine = entries(tree.children).map(function (entry) {
return reduceTree(entry[1], path.concat([entry[0]]));
});
// Find the winning child by picking the one with
// the maximum value and return that
var ret = toCombine[0];
toCombine.forEach(function(child) {
if (child.data > ret.data)
ret = child;
});
return ret;
}
var ret = reduceTree(tree, []);
log({values: values, ret: ret})
return ret;
}
This works as long as descendants are reduced in the same invocation with any parent nodes that should take precedence, but when the parent ends up being dropped from the reduce (as in the example above) means the decedent is not replaced and can end up affecting the final calculation giving the wrong result.
I've tried gathering up the "contributing nodes" when reducing so that we retain the knowledge needed to know that a node is overwritten but that results in the "reduce function did not actually reduce the data" error.
Is this even possible with a reduce function or does it fundamentally break the constraints of a reduce function?

Related

Finding the intersection of two binary search trees

I am building and returning a new binary search tree constructed of the intersection of two other binary search trees. I have available to me a find function which I am using to check if any values of the first tree are found in the second tree and if they are I insert them into a new tree. My issue is that the new tree "res" does not update upon recursion. So if I test it with 8, 4, 10 for tree1 and tree2 is 8, 6, 4, 13 my new tree only contains 8 and not 8 and 4. Grateful for any feedback!
BinSearchTree *BinSearchTree::intersectWith(TreeNode *root1, TreeNode *root2) {
BinSearchTree *res = new BinSearchTree();
//traverse one tree and use find to traverse the second tree
if(local_find(root2, root1->value()) && !local_find(res->root, root1->value()))
res->insert(root1->value());
if(root1->leftSubtree() != nullptr)
intersectWith(root1->leftSubtree(), root2);
if(root1->rightSubtree() != nullptr)
intersectWith(root1->rightSubtree(), root2);
return res;
}
You are overwriting res every time you call intersectWith() by calling 'new BinSearchTree()'.
To fix this, create res outside of interSectWith() and then pass it in as a third parameter. (You could then also consider making intersectWith() return void, as a return value is no longer needed.)

Machine Learning Algorithm using recursion

I am currently working on a very beginners version of the ID3 machine learning algorithm. I am stuck on how to recursively call my build_tree function to actually make the rest of the decision tree and output it in a nice format. I have calculated gains, entropies, gain ratios, etc. but I have no clue how to integrate recursion into my function.
I am given a data set, which after doing all the calculations mentioned above, have split it into two datasets. Now I need to be able to recursively call it until both the left and right data sets become pure [which can easily be checked by a function i wrote called dataset.is_pure()], all while keeping track of the threshold at each node. I know that all my calculations and split methods are working as I have done individuual testing on them. It is just the recursive part that I am having trouble with.
Here is my build_tree function that I am having a recursion nightmare with. I am currently working in a linux environment with the g++ compiler. The code I have right now compiles, but when run gives me a segmentation error. Any and all help would be greatly appreciated!
struct node
{
vector<vector<string>> data;
double atrb;
node* parent;
node* left = NULL;
node* right = NULL;
node(node* parent) : parent(parent) {}
};
node* root = new node(NULL);
void build_tree(node* current, dataset data_set)
{
vector<vector<string>> l_d;
vector<vector<string>> r_d;
double global_entropy = calc_entropy(data_set.get_col(data_set.n_col()-1));
int best_col = this->get_best_col(data_set, global_entropy);
hash_map selected_atrb(data_set.n_row(), data_set.truncate(best_col));
double threshold = get_threshold(selected_atrb, global_entropy);
cout << threshold << "\n";
split_data(threshold, best_col, data_set, l_d, r_d);
dataset right_data(r_d);
dataset left_data(l_d);
right_data.delete_col(best_col);
left_data.delete_col(best_col);
if(left_data.is_pure())
return;
else
{
node* new_left = new node(current);
new_left->atrb = threshold;
current->left = new_left;
new_left->data = l_d;
return build_tree(new_left, left_data);
}
if(right_data.is_pure())
return;
else
{
node* new_right = new node(current);
new_right->atrb = threshold;
current->right = new_right;
new_right->data = r_d;
return build_tree(new_right, right_data);
}
}
id3(dataset data)
{
build_tree(root, data);
}
};
This is only a part of my class. If you wish to see any other code, just let me know!
Regards,
I will explain to you with pseudocodigo how the reclusive function works, I will also leave you the code that you make in javascript for the implementation of said algorithm.
Before going into detail, I will mention certain concepts and classes you use.
Attribute: Characteristic of the data set, it is usually the name of a column of the data set.
Class: Decision characteristic, it is generally of binary value and usually it is always the last column of the data set.
Value: Possible value of the attribute in the data set, for example (Sunny, Cloudy, Rainy)
Tree: classes that have a number of nodes associated with each other.
Node: Entity in charge of storing the attribute (question), also has a list with the arcs.
Arc: Contains the value of an attribute and has an attribute that will contain the following child node.
Leaf : Contains a class. This node is the result of a decision, for example (Yes or No).
Best feature: Attribute with the highest information gain.
Function to create the tree from a set of data:
Obtain the values ​​of a class.
Evaluate if there is only one type of class in the data set, for example (Yes).   
If true, then we create a Leaf object and return this object
Obtain the information gain of each current attribute.
Choose the attribute with the highest information gain.
Create a node with the best feature.
Obtain the values ​​of the best feature.
Iterate the list of those values.
Filter the list, so that there are only records with the value that we are iterating (save it in a variable temporary)
Create an Arc with this value.
     - Assign the following attribute to the Arc: (Here comes the recursion) call again the same only function that you send (the filtered list of records, the class, the list of attributes without the best feature, the list of general attributes without the attributes of the best feature)
Add the arc to the node.
Return the node.
This would be the segment of code that is responsible for creating the tree
let crearArbol = (ejemplosLista, clase, atributos, valores) => {
let valoresClase = obtenerValoresAtributo(ejemplosLista, clase);
if (valoresClase.length == 1) {
autoIncremental++;
return new Hoja(valoresClase[0], autoIncremental);
}
if (atributos.length == 0) {
let claseDominante = claseMayoritaria(ejemplosLista);
return new Atributo();
}
let gananciaAtributos = obtenerGananciaAtributos(ejemplosLista, valores, atributos);
let atributoMaximo = atributos[maximaGanancia(gananciaAtributos)];
autoIncremental++;
let nodo = new Atributo(atributoMaximo, [], autoIncremental);
let valoresLista = obtenerValoresAtributo(ejemplosLista, atributoMaximo);
valoresLista.forEach((valor) => {
let ejemplosFiltrados = arrayDistincAtributos(ejemplosLista, atributoMaximo, valor);
let arco = new Arco(valor);
arco.sigNodo = crearArbol(ejemplosFiltrados, clase, [...eliminarAtributo(atributoMaximo, atributos)], [...eliminarValores(atributoMaximo, valores)]);
nodo.hijos.push(arco);
});
return nodo;
};
Unfortunately, the code is only in Spanish.
This is the repository that contains my project with this implementation Source code of id3

Low Memory Shortest Path Algorithm

I have a global unique path table which can be thought of as a directed un-weighted graph. Each node represents either a piece of physical hardware which is being controlled, or a unique location in the system. The table contains the following for each node:
A unique path ID (int)
Type of component (char - 'A' or 'L')
String which contains a comma separated list of path ID's which that node is connected to (char[])
I need to create a function which given a starting and ending node, finds the shortest path between the two nodes. Normally this is a pretty simple problem, but here is the issue I am having. I have a very limited amount of memory/resources, so I cannot use any dynamic memory allocation (ie a queue/linked list). It would also be nice if it wasn't recursive (but it wouldn't be too big of an issue if it was as the table/graph itself if really small. Currently it has 26 nodes, 8 of which will never be hit. At worst case there would be about 40 nodes total).
I started putting something together, but it doesn't always find the shortest path. The pseudo code is below:
bool shortestPath(int start, int end)
if start == end
if pathTable[start].nodeType == 'A'
Turn on part
end if
return true
else
mark the current node
bool val
for each node in connectedNodes
if node is not marked
val = shortestPath(node.PathID, end)
end if
end for
if val == true
if pathTable[start].nodeType == 'A'
turn on part
end if
return true
end if
end if
return false
end function
Anyone have any ideas how to either fix this code, or know something else that I could use to make it work?
----------------- EDIT -----------------
Taking Aasmund's advice, I looked into implementing a Breadth First Search. Below I have some c# code which I quickly threw together using some pseudo code I found online.
pseudo code found online:
Input: A graph G and a root v of G
procedure BFS(G,v):
create a queue Q
enqueue v onto Q
mark v
while Q is not empty:
t ← Q.dequeue()
if t is what we are looking for:
return t
for all edges e in G.adjacentEdges(t) do
u ← G.adjacentVertex(t,e)
if u is not marked:
mark u
enqueue u onto Q
return none
C# code which I wrote using this code:
public static bool newCheckPath(int source, int dest)
{
Queue<PathRecord> Q = new Queue<PathRecord>();
Q.Enqueue(pathTable[source]);
pathTable[source].markVisited();
while (Q.Count != 0)
{
PathRecord t = Q.Dequeue();
if (t.pathID == pathTable[dest].pathID)
{
return true;
}
else
{
string connectedPaths = pathTable[t.pathID].connectedPathID;
for (int x = 0; x < connectedPaths.Length && connectedPaths != "00"; x = x + 3)
{
int nextNode = Convert.ToInt32(connectedPaths.Substring(x, 2));
PathRecord u = pathTable[nextNode];
if (!u.wasVisited())
{
u.markVisited();
Q.Enqueue(u);
}
}
}
}
return false;
}
This code runs just fine, however, it only tells me if a path exists. That doesn't really work for me. Ideally what I would like to do is in the block "if (t.pathID == pathTable[dest].pathID)" I would like to have either a list or a way to see what nodes I had to pass through to get from the source and destination, such that I can process those nodes there, rather than returning a list to process elsewhere. Any ideas on how i could make that change?
The most effective solution, if you're willing to use static memory allocation (or automatic, as I seem to recall that the C++ term is), is to declare a fixed-size int array (of size 41, if you're absolutely certain that the number of nodes will never exceed 40). By using two indices to indicate the start and end of the queue, you can use this array as a ring buffer, which can act as the queue in a breadth-first search.
Alternatively: Since the number of nodes is so small, Bellman-Ford may be fast enough. The algorithm is simple to implement, does not use recursion, and the required extra memory is only a distance (int, or even byte in your case) and a predecessor id (int) per node. The running time is O(VE), alternatively O(V^3), where V is the number of nodes and E is the number of edges.

For Looping Link List using Templates

Having used the various search engines (and the wonderful stackoverflow database), I have found some similar situations, but they are either far more complex, or not nearly as complex as what I'm trying to accomplish.
C++ List Looping
Link Error Using Templates
C++:Linked List Ordering
Pointer Address Does Not Change In A Link List
I'm trying to work with Link List and Node templates to store and print non-standard class objects (in this case, a collection of categorized contacts). Particularly, I want to print multiple objects that have the same category, out of a bunch of objects with different categories. When printing by category, I compare an sub-object tmpCategory (= "business") with the category part of a categorized contact.
But how to extract this data for comparison in int main()?
Here's what I'm thinking. I create a GetItem member function in LinkList.tem This would initialize the pointer cursor and then run a For loop until the function input matches the iteration number. At which point, GetItem returns object Type using (cursor -> data).
template <class Type>
Type LinkList<Type>::GetItem(int itemNumber) const
{
Node<Type>* cursor = NULL;
for(cursor = first;
cursor != NULL;
cursor = (cursor -> next))
{
for(int i = 0; i < used; i++)
{
if(itemNumber == i)
{
return(cursor -> data);
}
}
}
}
Here's where int main() comes in. I set my comparison object tmpCategory to a certain value (in this case, "Business"). Then, I run a For loop that iterates for cycles equal to the number of Nodes I have (as determined by a function GetUsed()). Inside that loop, I call GetItem, using the current iteration number. Theoretically, this would let the int main loop return the corresponding Node from LinkList.tem. From there, I call the category from the object inside that Node's data (which currently works), which would be compared with tmpCategory. If there's a match, the loop will print out the entire Node's data object.
tmpCategory = "Business";
for(int i = 0; i < myCategorizedContact.GetUsed(); i++)
{
if(myCategorizedContact.GetItem(i).getCategory() == tmpCategory)
cout << myCategorizedContact.GetItem(i);
}
The problem is that the currently setup (while it does run), it returns nothing at all. Upon further testing ( cout << myCategorizedContact.GetItem(i).getCategory() ), I found that it's just printing out the category of the first Node over and over again. I want the overall scheme to evaluate for every Node and print out matching data, not just spit out the same Node.
Any ideas/suggestions are greatly appreciated.
Please look at this very carefully:
template <class Type>
Type LinkList<Type>::GetItem(int itemNumber) const
{
Node<Type>* cursor = NULL;
// loop over all items in the linked list
for(cursor = first;
cursor != NULL;
cursor = (cursor -> next))
{
// for each item in the linked list, run a for-loop
// counter from 0 to (used-1).
for(int i = 0; i < used; i++)
{
// if the passed in itemNumber matches 'i' anytime
// before we reach the end of the for-loop, return
// whatever the current cursor is.
if(itemNumber == i)
{
return(cursor -> data);
}
}
}
}
You're not walking the cursor down the list itemNumber times. The very first item cursor references will kick off the inner-for-loop. The moment that loop index reaches itemNumber you return. You never advance your cursor if the linked list has at least itemNumber items in the list.. In fact, the two of them (cursor and itemNumber) are entirely unrelated in your implementation of this function. And to really add irony, since used and cursor are entirely unrelated, if used is ever less than itemNumber, it will ALWAYS be so, since used doesn't change when cursor advances through the outer loop. Thus cursor eventually becomes NULL and the results of this function are undefined (no return value). In summary, as written you will always either return the first item (if itemNumber < used), or undefined behavior since you have no return value.
I believe you need something like the following instead:
template< class Type >
Type LinkList<Type>::GetItem(int itemNumber) const
{
const Node<Type>* cursor = first;
while (cursor && itemNumber-- > 0)
cursor = cursor->next;
if (cursor)
return cursor->data;
// note: this is here because you're definition is to return
// an item COPY. This case would be better off returning a
// `const Type*` instead, which would at least allow you to
// communicate to the caller that there was no item at the
// proposed index (because the list is undersized) and return
// NULL, which the caller could check.
return Type();
}

Data structure for selecting groups of machines

I have this old batch system. The scheduler stores all computational nodes in one big array. Now that's OK for the most part, because most queries can be solved by filtering for nodes that satisfy the query.
The problem I have now is that apart from some basic properties (number of cpus, memory, OS), there are also these weird grouping properties (city, infiniband, network scratch).
Now the issue with these is that when a user requests nodes with infiniband I can't just give him any nodes, but I have to give him nodes connected to one infiniband switch, so the nodes can actually communicate using infiniband.
This is still OK, when user only requests one such property (I can just partition the array for each of the properties and then try to select the nodes in each partition separately).
The problem comes with combining multiple such properties, because then I would have to generate all combination of the subsets (partitions of the main array).
The good thing is that most of the properties are in a sub-set or equivalence relation (It sort of makes sense for machines on one infiniband switch to be in one city). But this unfortunately isn't strictly true.
Is there some good data structure for storing this kind of semi-hierarchical mostly-tree-like thing?
EDIT: example
node1 : city=city1, infiniband=switch03, networkfs=server01
node2 : city=city1, infiniband=switch03, networkfs=server01
node3 : city=city1, infiniband=switch03
node4 : city=city1, infiniband=switch03
node5 : city=city2, infiniband=switch03, networkfs=server02
node6 : city=city2, infiniband=switch03, networkfs=server02
node7 : city=city2, infiniband=switch04, networkfs=server02
node8 : city=city2, infiniband=switch04, networkfs=server02
Users request:
2x node with infiniband and networkfs
The desired output would be: (node1, node2) or (node5,node6) or (node7,node8).
In a good situation this example wouldn't happen, but we actually have these weird cross-site connections in some cases. If the nodes in city2 would be all on infiniband switch04, it would be easy. Unfortunately now I have to generate groups of nodes, that have the same infiniband switch and same network filesystem.
In reality the problem is much more complicated, since users don't request entire nodes, and the properties are many.
Edit: added the desired output for the query.
Assuming you have p grouping properties and n machines, a bucket-based solution is the easiest to set up and provides O(2p·log(n)) access and updates.
You create a bucket-heap for every group of properties (so you would have a bucket-heap for "infiniband", a bucket-heap for "networkfs" and a bucket-heap for "infiniband × networkfs") — this means 2p bucket-heaps.
Each bucket-heap contains a bucket for every combination of values (so the "infiniband" bucket would contain a bucket for key "switch04" and one for key "switch03") — this means a total of at most n·2p buckets split across all bucket-heaps.
Each bucket is a list of servers (possibly partitioned into available and unavailable). The bucket-heap is a standard heap (see std::make_heap) where the value of each bucket is the number of available servers in that bucket.
Each server stores references to all buckets that contain it.
When you look for servers that match a certain group of properties, you just look in the corresponding bucket for that property group, and climb down the heap looking for a bucket that's large enough to accomodate the number of servers requested. This takes O(log(p)·log(n)).
When servers are marked as available or unavailable, you have to update all buckets containing those servers, and then update the bucket-heaps containing those buckets. This is an O(2p·log(n)) operation.
If you find yourself having too many properties (and the 2p grows out of control), the algorithm allows for some bucket-heaps to be built on-demand from other bucket-heaps : if the user requests "infiniband × networkfs" but you only have a bucket-heap available for "infiniband" or "networkfs", you can turn each bucket in the "infiniband" bucket-heap into a bucket-heap on its own (use a lazy algorithm so you don't have to process all buckets if the first one works) and use a lazy heap-merging algorithm to find an appropriate bucket. You can then use a LRU cache to decide which property groups are stored and which are built on-demand.
My guess is that there won't be an "easy, efficient" algorithm and data structure to solve this problem, because what you're doing is akin to solving a set of simultaneous equations. Suppose there are 10 categories (like city, infiniband and network) in total, and the user specifies required values for 3 of them. The user asks for 5 nodes, let's say. Your task is then to infer values for the remaining 7 categories, such that at least 5 records exist that have all 10 category fields equal to these values (the 3 specified and the 7 inferred). There may be multiple solutions.
Still, provided there aren't too many different categories, and not too many distinct possibilities within each category, you can do a simple brute force recursive search to find possible solutions, where at each level of recursion you consider a particular category, and "try" each possibility for it. Suppose the user asks for k records, and may choose to stipulate any number of requirements via required_city, required_infiniband, etc.:
either(x, y) := if defined(x) then [x] else y
For each city c in either(required_city, [city1, city2]):
For each infiniband i in either(required_infiniband, [switch03, switch04]):
For each networkfs nfs in either(required_nfs, [undefined, server01, server02]):
Do at least k records of type [c, i, nfs] exist? If so, return them.
The either() function is just a way of limiting the search to the subspace containing points that the user gave constraints for.
Based on this, you will need a way to quickly look up the number of points (rows) for any given [c, i, nfs] combination -- nested hashtables will work just fine for this.
Step 1: Create an index for each property. E.g. for each property+value pair, create a sorted list of nodes with that property. Put each such list into an associative array of some kind- That is something like and stl map, one for each property, indexed by values. Such that when you are done you have a near constant time function that can return to you a list of nodes that match a single property+value pair. The list is simply sorted by node number.
Step 2: Given a query, for each property+value pair required, retrieve the list of nodes.
Step 3: Starting with the shortest list, call it list 0, compare it to each of the other lists in turn removing elements from list 0 that are not in the other lists.
You should now have just the nodes that have all the properties requested.
Your other option would be to use a database, it is already set up to support queries like this. It can be done all in memory with something like BerkeleyDB with the SQL extensions.
If sorting the list by every criteria mentioned in the query is viable (or having the list pre-sorted by each relative criteria), this works very well.
By "relative criteria", I mean criteria not of the form "x must be 5", which are trivial to filter against, but criteria of the form "x must be the same for each item in the result set". If there are also criteria of the "x must be 5" form, then filter against those first, then do the following.
It relies on using a stable sort on multiple columns to find the matching groups quickly (without trying out combinations).
The complexity is number of nodes * number of criteria in the query (for the algorithm itself) + number of nodes * log(number of nodes) * number of criteria (for the sort, if not pre-sorting). So Nodes*Log(Nodes)*Criteria.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace bleh
{
class Program
{
static void Main(string[] args)
{
List<Node> list = new List<Node>();
// create a random input list
Random r = new Random();
for (int i = 1; i <= 10000; i++)
{
Node node = new Node();
for (char c = 'a'; c <= 'z'; c++) node.Properties[c.ToString()] = (r.Next() % 10 + 1).ToString();
list.Add(node);
}
// if you have any absolute criteria, filter the list first according to it, which is very easy
// i am sure you know how to do that
// only look at relative criteria after removing nodes which are eliminated by absolute criteria
// example
List<string> criteria = new List<string> {"c", "h", "r", "x" };
criteria = criteria.OrderBy(x => x).ToList();
// order the list by each relative criteria, using a ***STABLE*** sort
foreach (string s in criteria)
list = list.OrderBy(x => x.Properties[s]).ToList();
// size of sought group
int n = 4;
// this is the algorithm
int sectionstart = 0;
int sectionend = 0;
for (int i = 1; i < list.Count; i++)
{
bool same = true;
foreach (string s in criteria) if (list[i].Properties[s] != list[sectionstart].Properties[s]) same = false;
if (same == true) sectionend = i;
else sectionstart = i;
if (sectionend - sectionstart == n - 1) break;
}
// print the results
Console.WriteLine("\r\nResult:");
for (int i = sectionstart; i <= sectionend; i++)
{
Console.Write("[" + i.ToString() + "]" + "\t");
foreach (string s in criteria) Console.Write(list[i].Properties[s] + "\t");
Console.WriteLine();
}
Console.ReadLine();
}
}
}
I would do something like this (obviously instead of strings you should map them to int, and use int's as codes)
struct structNode
{
std::set<std::string> sMachines;
std::map<std::string, int> mCodeToIndex;
std::vector<structNode> vChilds;
};
void Fill(std::string strIdMachine, int iIndex, structNode* pNode, std::vector<std::string> &vCodes)
{
if(iIndex < vCodes.size())
{
// Add "Empty" if Needed
if(pNode->vChilds.size() == 0)
{
pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair("empty", 0));
pNode->vChilds.push_back(structNode());
}
// Add for "Empty"
pNode->vChilds[0].sMachines.insert(strIdMachine);
Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[0], vCodes );
if(vCodes[iIndex] == "empty")
return;
// Add for "Any"
std::map<std::string, int>::iterator mIte = pNode->mCodeToIndex.find("any");
if(mIte == pNode->mCodeToIndex.end())
{
mIte = pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair("any", pNode->vChilds.size()));
pNode->vChilds.push_back(structNode());
}
pNode->vChilds[mIte->second].sMachines.insert(strIdMachine);
Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[mIte->second], vCodes );
// Add for "Segment"
mIte = pNode->mCodeToIndex.find(vCodes[iIndex]);
if(mIte == pNode->mCodeToIndex.end())
{
mIte = pNode->mCodeToIndex.insert(pNode->mCodeToIndex.begin(), make_pair(vCodes[iIndex], pNode->vChilds.size()));
pNode->vChilds.push_back(structNode());
}
pNode->vChilds[mIte->second].sMachines.insert(strIdMachine);
Fill(strIdMachine, (iIndex + 1), &pNode->vChilds[mIte->second], vCodes );
}
}
//////////////////////////////////////////////////////////////////////
// Get
//
// NULL on empty group
//////////////////////////////////////////////////////////////////////
set<std::string>* Get(structNode* pNode, int iIndex, vector<std::string> vCodes, int iMinValue)
{
if(iIndex < vCodes.size())
{
std::map<std::string, int>::iterator mIte = pNode->mCodeToIndex.find(vCodes[iIndex]);
if(mIte != pNode->mCodeToIndex.end())
{
if(pNode->vChilds[mIte->second].sMachines.size() < iMinValue)
return NULL;
else
return Get(&pNode->vChilds[mIte->second], (iIndex + 1), vCodes, iMinValue);
}
else
return NULL;
}
return &pNode->sMachines;
}
To fill the tree with your sample
structNode stRoot;
const char* dummy[] = { "city1", "switch03", "server01" };
const char* dummy2[] = { "city1", "switch03", "empty" };
const char* dummy3[] = { "city2", "switch03", "server02" };
const char* dummy4[] = { "city2", "switch04", "server02" };
// Fill the tree with the sample
Fill("node1", 0, &stRoot, vector<std::string>(dummy, dummy + 3));
Fill("node2", 0, &stRoot, vector<std::string>(dummy, dummy + 3));
Fill("node3", 0, &stRoot, vector<std::string>(dummy2, dummy2 + 3));
Fill("node4", 0, &stRoot, vector<std::string>(dummy2, dummy2 + 3));
Fill("node5", 0, &stRoot, vector<std::string>(dummy3, dummy3 + 3));
Fill("node6", 0, &stRoot, vector<std::string>(dummy3, dummy3 + 3));
Fill("node7", 0, &stRoot, vector<std::string>(dummy4, dummy4 + 3));
Fill("node8", 0, &stRoot, vector<std::string>(dummy4, dummy4 + 3));
Now you can easily obtain all the combinations that you want for example you query would be something like this:
vector<std::string> vCodes;
vCodes.push_back("empty"); // Discard first property (cities)
vCodes.push_back("any"); // Any value for infiniband
vCodes.push_back("any"); // Any value for networkfs (except empty)
set<std::string>* pMachines = Get(&stRoot, 0, vCodes, 2);
And for example only City02 on switch03 with networfs not empty
vector<std::string> vCodes;
vCodes.push_back("city2"); // Only city2
vCodes.push_back("switch03"); // Only switch03
vCodes.push_back("any"); // Any value for networkfs (except empy)
set<std::string>* pMachines = Get(&stRoot, 0, vCodes, 2);