Memory continuity and search tree - c++

I'm trying to build a search tree on top of my results. Some kind of k-ary tree with n leafs at the end. I'm looking for a C++ solution and experimenting with std::vector but can't get it done as I need memory consistency. It could be done by nested vectors but I can't do that.
Let me explain the details by example:
A unsorted result could be
Result R = { 4, 7, 8, 3, 1, 9, 0, 2, 2, 9, 6 }
On top of that I need a tree with nodes wich in my specific problem are centroids. But to keep it simple I will use artificial values here.
I define the search tree dimensions as
Height H = 2
Branch B = 3
The tree at first
4 7 8 3 1 9 0 2 2 9 6
Second step
layer_0 1.6 5 8.2
| | |
+-+-+-+-+-+ +-+ +-+-+-+
| | | | | | | | | | | |
layer_1 3 1 0 2 2 3 4 6 7 8 9 9
Last step
layer_0 1.6 5 8.2
| | |
+---+---+ +-+---+ +---+----+
layer_1 0.8 1.6 2.4 4.2 5 5.8 6.4 8.2 8.4
| | | | | | |
+-+ +-+-+ | | | | +-+
layer_2 1 0 2 2 3 4 6 7 8 9 9
This last tree is not a k-ary tree as the end-leafs sizes are 0 <= size <= |R|.
At this moment I'm experimenting with two vectors.
std::vector<size_t> layer_2;
std::vector<float> leafs;
std::size_t width, height;
With help of width and height it would be possible to navigate through leafs. But I'm questioning myself how to elegantly connect leafs and layer_2?
How would a good solution look like?

Note: This solution is continuous in the sense that it uses a contiguous data structure (vector or array) instead of a Node pointer style tree, but has the potential to have unused space within the data structure depending on the application.
A case where this approach wastes a lot of space: The max amount of branches per node is large but most nodes actually have far fewer children. This will have no effect on how long it takes to find the leafs though. In fact it is a trade off to make that bit quite fast.
consider a 3 branch tree with 4 levels in continuous memory:
R,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,aab,aac,aba,abb,abc,baa.... where the index of a node's children ranges from (parent_index*3)+1 to (parent_index*3)+3
The important caveat I alluded to is that every node must always have it's three child spaces in the vector, array, whatever. If a node has say only 2 children, just fill that extra space with a null_child value to hold the space. (this is where the wasted space comes from)
The advantage is that now, finding all of the leaves is easy.
first_leaf_index = 0
for(i=0;i<(4-1);i++)//in this example 4 is the depth
first_leaf_index += 3^(i) //three is max branches per node
At that point just iterate to the end of the data structure

Related

Finding the neighbors of a node/vertex in a 2D mesh

I have a 2D mesh defined by nodes and elements.
Structure of a node: Node ID, X position, Y position
Structure of an element: Element ID, Node 1, Node 2, Node 3, Node 4
Example of a 2x2 elements mesh:
Nodes:
ID X Y
1 0 0
2 0 1
3 0 2
4 1 0
5 1 1
6 1 2
7 2 0
8 2 1
9 2 2
Elements:
ID N1 N2 N3 N4
1 1 2 4 5
2 2 3 5 6
3 4 5 7 8
4 5 6 8 9
N7-----N8-----N9
| | |
| E3 | E4 |
| | |
N4-----N5-----N6
| | |
| E1 | E2 |
| | |
N1-----N2-----N3
I'm storing both nodes and elements in linked lists.
My question: How can I find the neighbors (nodes) for an arbitrary selected node?
The neighbors of N5, for example, would be N2, N4, N6 and N8.
*Note: This 2x2 element mesh simplified example for explanation proposes, the meshes I'm dealing with may contain several thousands of nodes and elements.
I also have been looking at some concepts of graph theory, but I'm not sure which may be the right way to go.
It would be good to have element's vertices ordered in a way that they make closed polygon. Vertices [1, 2, 4, 5] do not uniquely define first element. From your description it can be seen that you mean that is a polygon with four vertices in order (1, 2, 5, 4). But without picture it can be also degenerated quad (1, 2, 4, 5).
Like:
Elements:
ID N1 N2 N3 N4
1 1 2 5 4
2 2 3 6 5
3 4 5 8 7
4 5 6 9 8
If you are not sure about vertices order, than you have to check about element self-intersection, and reorder vertices to resolve intersections.
With that kind of data it is easy to find all neighbours of given node. Pass through all elements, if element contains given node, than there are two neighbours in that element, vertex before and after in a list.
For node 5, in first element there are neighbours 2 and 4, in second element there are neighbours 6 and 2, ...
If there will be lot of inquires of this kind, than it is better to make extract connectivity information in separate structure. That can be map that maps node to set of it's neighbours. To make it, pass through all elements, and for each element vertex add two neighbours in node's list.

What is the tree-structure of a heap?

I'm reading Nicolai M. Josuttis's "The C++ standard library, a tutorial and reference", ed2.
He explains the heap data structure and related STL functions in page 607:
The program has the following output:
on entry: 3 4 5 6 7 5 6 7 8 9 1 2 3 4
after make_heap(): 9 8 6 7 7 5 5 3 6 4 1 2 3 4
after pop_heap(): 8 7 6 7 4 5 5 3 6 4 1 2 3
after push_heap(): 17 7 8 7 4 5 6 3 6 4 1 2 3 5
after sort_heap(): 1 2 3 3 4 4 5 5 6 6 7 7 8 17
I'm wondering how could this be figured out? for example, why the leaf "4" under path 9-6-5-4 is the left side child of node "5", not the right side one? And after pop_heap what's the tree structure then? In IDE debugging mode I could only see see the content of the vector, is there a way to figure out the tree structure?
why the leaf "4" under path 9-6-5-4 is the left side child of node "5", not the right side one?
Because if it was on the right side, that would mean there is a gap in the underlying vector. The tree structure is for illustrative purposes only. It is not a representation of how the heap is actually stored. The tree structure is mapped onto the underlying vector via a simple mathematical formula.
The root node of the tree is the first element of the vector (index 0). The index of the left child of a node is obtained from its parent's index by the simple formula: i * 2 + 1. And the index of the right child is obtained by i * 2 + 2.
And after pop_heap what's the tree structure then?
The root node is swapped with the greater of its two children1, and this is repeated until it is at the bottom of the tree. Then it is swapped with the last element. This element is then pushed up the tree, if necessary, by swapping with its parent if it is greater.
The root node is swapped with the last element of the heap. Then, this element is pushed down the heap by swapping with the greater of its two children1. This is repeated until it is in the correct position (i.e. it is not less than either of its children).
So after pop_heap, your tree looks like this:
----- 8 -----
| |
---7--- ---6---
| | | |
-7- -4- -5- x5
| | | | | | x
3 6 4 1 2 3 9
The 9 is not actually part of the heap anymore, but it is still part of the vector until you erase it, via a call pop_back or similar.
1. if the children are equal, as in the case of the adjacent 7's in the tree in your example, it could go either way. I believe that std::pop_heap sends it to the right, though I'm not sure if this is implementation defined
The first element in the vector is the root at index 0. Its left child is at index 1 and its right child at index 2. In general: left_child(i) = 2 * i + 1 and right_child(i) = 2 * i + 2 and parent(i) = floor((i - 1) / 2)
Another way to think about it is the heap fills each level from left to right in the tree. Following the elements in the vector the first level is 9 (1 value), second level 8 6 (2 values) and third level 7 7 5 5 (4 values), and so on. Both these ways will help you draw the heap in a tree structure when given a vector.

Queue with mod operation

I'm studying fundamental of data structure (Queue) , so far I understand the flow of Queue but I don't understand whenever queue is applying with Mod operator. There a several question which confusing my brain. How to answer this question (refer to picture)?
The best method for handling circular queues is to draw them out. Since circles don't post very well with ASCII art, I'll use a linear array.
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4
^
Rear
The REAR is at index 4.
Let's perform the operation step by step.
First: Add 1 to REAR. This makes REAR point beyond the array:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4 5
^
Rear
Applying the modulo operation, %, this will give us the remainder of 5 / 5 which is zero:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4
^
Rear
Thus the modulo operation wraps around the array, like a circle.
The next question is for you to solve. Remember draw the array or queue. You can use circles (think of a pie sliced or a pizza sliced).
Edit 1: Modulo details
The modulo operation will give a value in the range 0..N, when N is the divisor.
Given N == 4, here are some results for modulo:
Index result
0 0
1 1
2 2
3 3
4 0 --> The remainder of 4 / 4 == 0.
5 1
6 2
7 3
8 0 --> The remainder of 8 / 4 == 0.
Modulus returns the remainder of the two operands. For example, 4%2=0 since 4/2=2 with no remainder, while 4%3=1 since 4/3=1 with remainder 1. Since you can never have a remainder higher than the right operand, you have an effective "range" of answers for any modulus of 0 to (n-1). With that in mind, just plug in the numbers for the variables ((4+1)%5=? and (1+1)%4=?). Usually to find the remainder you would use long division, but one useful thing to remember is that any number divided by itself has a remainder of 0, and any number divided by a larger number will have a remainder equal to itself.

C++ Inverted Weighted Shuffle/Random

I have a list of weighted objects i.e.:
A->1 B->1 C->3 D->2 E->3
is there an efficient algorithm in C++ to pick random elements according to their weight?
For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)
As #Dukeling said, we need more info. Like how you interpret and use the selection chance.
At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.
Suppose you start with badness score
B[i] = how badly you don't want to select the i-th item
And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.
As you say, one obvious way is to use multiplicative inverse:
S[i] = 1 / B[i]
However, there might be a little problem with that.
The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.
Ask yourself this:
Say
B[1] = 1 -> S[1] = 1
B[2] = 2 -> S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2
But with the same amount of change
B[3] = 1000 -> S[3] = 0.001
B[4] = 1001 -> S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4
I'll just throw one possible alternative scheme here for now.
S[i] = max(B) - B[i] + 1
The + 1 part helps so no item has zero chance to be selected.
This ends the part of calculating selection score.
Next, let's clear up how to use the selection score in roulette wheel fashion.
Assume we decided to use the additive inverse scheme.
B[1] = 1 -> S[1] = 1001
B[2] = 2 -> S[2] = 1000
B[3] = 1000 -> S[3] = 2
B[4] = 1001 -> S[4] = 1
Then imagine each point in the score is correspond to a lottery ticket.
Let's assign the ticket a running IDs.
| Item | Score = #ticket | ticket ID | win chance |
| 1 | 1001 | 0 to 1000 | 1001/2004 ~ 0.499500998 |
| 2 | 1000 | 1001 to 2000 | 1000/2004 ~ 0.499001996 |
| 3 | 2 | 2001 to 2002 | 2/2004 ~ 0.000998004 |
| 4 | 1 | 2003 to 2003 | 1/2004 ~ 0.000499002 |
There are 2004 tickets in total.
To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004).
Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.
For comparison, here is the selection chance in case the multiplicative inverse scheme is used.
| Item | win chance |
| 1 | 1/1.501999001 ~ 0.665779404 |
| 2 | 0.5/1.501999001 ~ 0.332889702 |
| 3 | 0.001/1.501999001 ~ 0.000665779 |
| 4 | 0.000999001/1.501999001 ~ 0.000665114 |
You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.
Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.

Calculating a boundary around several linked rectangles

I am working on a project where I need to create a boundary around a group of rectangles.
Let's use this picture as an example of what I want to accomplish.
EDIT: Couldn't get the image tag to work properly, so here is the full link:
http://www.flickr.com/photos/21093416#N04/3029621742/
We have rectangles A and C who are linked by a special link rectangle B. You could think of this as two nodes in a graph (A,C) and the edge between them (B). That means the rectangles have pointers to each other in the following manner: A->B, A<-B->C, C->B
Each rectangle has four vertices stored in an array where index 0 is bottom left, and index 3 is bottom right.
I want to "traverse" this linked structure and calculate the vertices making up the boundary (red line) around it. I already have some small ideas around how to accomplish this, but want to know if some of you more mathematically inclined have some neat tricks up your sleeves.
The reason I post this here is just that someone might have solved a similar problem before, and have some ideas I could use. I don't expect anyone to sit down and think this through long and hard. I'm going to work on a solution in parallell as I wait for answers.
Any input is greatly appreciated.
Using the example, where rectangles are perpendicular to each other and can therefore be presented by four values (two x coordinates and two y coordinates):
1 2 3 4 5 6
1 +---+---+
| |
2 + A +---+---+
| | B |
3 + + +---+---+
| | | | |
4 +---+---+---+---+ +
| |
5 + C +
| |
6 +---+---+
1) collect all the x coordinates (both left and right) into a list, then sort it and remove duplicates
1 3 4 5 6
2) collect all the y coordinates (both top and bottom) into a list, then sort it and remove duplicates
1 2 3 4 6
3) create a 2D array by number of gaps between the unique x coordinates * number of gaps between the unique y coordinates. It only needs to be one bit per cell, so in c++ a vector<bool> with likely give you a very memory-efficient version of this
4 * 4
4) paint all the rectangles into this grid
1 3 4 5 6
1 +---+
| 1 | 0 0 0
2 +---+---+---+
| 1 | 1 | 1 | 0
3 +---+---+---+---+
| 1 | 1 | 1 | 1 |
4 +---+---+---+---+
0 0 | 1 | 1 |
6 +---+---+
5) for each cell in the grid, for each edge, if the cell beside it in that cardinal direction is not painted, draw the boundary line for that edge
In the question, the rectangles are described as being four vectors where each represents a corner. If each rectangle can be at arbitrary and different rotation from others, then the approach I've outlined above won't work. The problem of finding the path around a complex polygon is regularly solved by vector graphics rasterizers, and a good approach to solving the problem is using a library such as Cairo to do the work for you!
The generalized solution to this problem is to implement boolean operations in terms of a scanline. You can find a brief discussion here to get you started. From the text:
"The basis of the boolean algorithms is scanlines. For the basic principles the book: Computational Geometry an Introduction by Franco P. Preparata and Michael Ian Shamos is very good."
I own this book, though it's at the office now, so I can't look up the page numbers you should read, though chapter 8, on the geometry of rectangles is probably the best starting point.
Calculate the sum of the boundaries of all 3 rectangles seperately
calculate the overlapping rectangle of A and B, and subtract it from the sum
Do the same for the overlapping rectangle of B and C
(to get the overlapping rectangle from A and B take the middle 2 X positions, together with the middle 2 Y positions)
Example (x1,y1) - (x2,y2):
Rectangle A: (1,1) - (3,4)
Rectangle B: (3,2) - (5,4)
Rectangle C: (4,3) - (6,6)
Calculation:
10 + 8 + 10 = 28
X coords ordered = 1,3,3,5 middle two are 3 and 3
Y coords ordered = 1,2,4,4 middle two are 2 and 4
so: (3,2) - (3,4) : boundery = 4
X coords ordered = 3,4,5,6 middle two are 4 and 5
Y coords ordered = 2,3,4,6 middle two are 3 and 4
so: (4,3) - (5,4) : boundery = 4
28 - 4 - 4 = 20
This is my example visualized:
1 2 3 4 5 6
1 +---+---+
| |
2 + A +---+---+
| | B |
3 + + +---+---+
| | | | |
4 +---+---+---+---+ +
| |
5 + C +
| |
6 +---+---+
A simple trick should be:
Create a region from the first rectangle
Add the other rectangles to the region
Get the boundary of the region (somehow? :P)
After some thinking I might end up doing something like this:
Pseudo code:
LinkRectsConnectedTo(Rectangle rectangle,Edge startEdge) // Edge can be West,North,East,South
for each edge in rectangle starting with the edge facing last rectangle
add vertices in the edge to the final boundary polygon
if edge is connected to another rectangle
if edge not equals startEdge
recursively call LinkRectsConnectedTo(rectangle,startEdge)
Obvisouly this pseudo code would have to be refined a bit and might not cover all cases, but I think I might have solved my own problem.
I haven't thought this out completely, but I wonder if you couldn't do something like:
Make a list of all the edges.
Get all the edges where P1.X = P2.X
In that list, get the pairs where X are equal
For each pair, replace with one or two edges for the parts where they DON'T overlap
Do something clever to get the edges in the right order
Will your rectangles always be horizontally aligned, if not you'd need to do the same thing but for Y too?
And are they always guaranteed to be touching? If not the algorithm wouldn't be broken, but the 'right order' wouldn't be definable.