Show that for any AVL tree with height h, all levels until h/2 are complete trees by induction - height

I was given this question on a test: "show by induction, that for a given AVL tree of height h, all levels of the tree until h/2 (round down) are complete binary trees". I wrote down the following answer and would like to know if my argument stands.
base: h= 0-> then level 0/2 is complete binary tree since theres just a root. h=1-> same argument since 1/2 round down is zero
step: assume correctness for all avl trees with level h and show for h+1.
assume we have a AVL tree of height h+1. remove the root and receive two AVL trees of height h or one h and one h-1.
divide into two cases: h is even, h is odd.
h is odd: h/2=(h-1)/2 (round down) so we get that the same levels on the two sub trees are complete by induction, add back the root and we get that all level h+1/2 are complete.
h is even: if the trees are of different height then h/2= (h-1)/2 + 1 (round down). so since the taller tree is a complete binary tree until h/2 it stands that he is complete binary tree until level h-1/2. add back the root and we get h+1/2 is complete binary tree.
id like to know how close this is to a correct answer.

The proof you have presented is indeed correct if you add parentheses where they are missing, and a few other improvements. See what I updated in bold:
base: h= 0-> then level 0/2 is complete binary tree since there's just a root. h=1-> same argument since 1/2 round down is zero.
step: assume correctness for all avl trees with level h and show for h+1. assume we have a AVL tree of height h+1. remove the root and receive two AVL trees of height h or one h and one h-1. divide into two cases: h is even, h is odd. h is odd: h/2=(h-1)/2 (round down) so we get that the same levels on the two sub trees are complete by induction, add back the root and we get that all level (h+1)/2 are complete. h is even: if the trees are of different height then h/2= (h-1)/2 + 1 (round down). so since the taller tree is a complete binary tree until h/2 it stands that he is complete binary tree until level (h-1)/2, which is h/2 - 1 as h is even. add back the root and we get h/2 is complete binary tree.

Related

Closest point from List for every point of other List

I have a population of so called "Dots" that search for food. Every Dot has a sight_ value, which indicates the range in which it can see food.
The position of each Dot is saved as a pair<uint16_t,uint16_t>. The positions of all foodsources are in a vector<pair<uint16_t,uint16_t>>.
Now I want to calculate the closest foodsource for every Dot, which this Dot can see. And I don't want to calculate the distance of every combination.
My idea was to create a copy of the food-vector, sort one copy by x and the other by y. Then find the interval [x-sight, x+sight] respectively [y-sight, y+sight] in the vectors and then create the intersection of both.
I've read over set_intersection, but it requires both ranges to be sorted with the same rule.
Any Ideas how I could do this? Could also be that my Idea is just the wrong approach.
Thanks
IceFreez3r
Edit:
I did some runtime approximations:
Sort Food: n log n
Find Interval for one Coordinate and one Dot: 2 log n (lower and upper bound)
If we assume equal distribution of food sources, we can calculate the bound that is estimated to be closer to the middle first and then calculate the second bound in the rest interval. This would reduce the runtime to: log n + log(n/2) (Just realized this s probably not *that* powerful:log(n/2) =~ log(n) - 1)
Build intersection: #x * #y =~ (n * sight/testgroundsize)^2
Compute exact Distance for every Food in Intersection: n * (sight/testgroundsize)^2
Sum: 2 n log n + 2 * #Dots * (log n + log(n/2) + (n * sight/testgroundsize)^2 + n * (sight/testgroundsize)^2)
Sum with just limiting one coordinate: n log n + #Dots * (log n + log(n/2) + n * sight/testgroundsize)
I did some tests and just calculated the above formulas on the run:
int dots = dots_.size();
int sum = 2 * n * log(n) + 2 * dots * (log(n) + log(n/2) + pow(n * (sum_sight / dots) / testground_size_,2) + n * pow((sum_sight / dots) / testground_size_, 2));
int sum2 = n * log(n) + dots * (log(n) + log(n/2) + n * (sum_sight / dots) / testground_size_);
cout << n*dots << endl << sum << endl << sum2 << endl;
It turned out the Intersection idea is just bad. While the idea of just limiting one coordinate is at least better than brute-force.
I didn't think about the grid-idea yet #Daniel Jour
You're stepping into a whole field of interesting approaches to this problem. Terms to Google are binary space partitioning, quadtrees, ... and of course nearest neighbour search.
A relatively simple but effective approach when the dots are far more spread than what their "visible range" is:
Select a value "grid size".
Create a map from grid coordinates to a list/set of entities
For each food source: put them in the map at their grid coordinates
For each dot: put them in the map at their grid coordinates and also in the neighbour grid "cells". The size of the neighbourhood depends on the grid size and the dot's sight value
For each entry in the map which contains at least one dot: Either do this algorithm recursively with a smaller grid size or use the brute force approach: check each dot in that grid cell against each food source in that grid cell.
This is a linear algorithm, compared with the quadratic brute force approach.
Calculation of grid coordinates: grid_x = int(x / grid_size) ... same for other coordinate.
Neighbourhood: steps = ceil(sight_value / grid_size) .. the neighbourhood is a square with side length 2×steps + 1 centred at the dot's grid coordinates
I believe your approach is incorrect. This can be mathematically verified. What you can do instead is calculate the magnitude of the vector joining the dot with the food source by means of Pythagoras theorem, and ensure that this magnitude is less than the observation limit. This deals exclusively with determining relative distance, as defined by the Cartesian co-ordinate system, and the standard unit of measurement. In relation to efficiency concerns, the first order of business is to determine if the approach to be taken is in computational terms in actuality less efficient, as measured by time, even though the logical component responsible for certain calculations are, in virtue of this alternative implementation, less time consuming. Of coarse, the ideal is one in which the time taken is decreased, and not merely numerically contained by means of refactoring.
Now, if it is the case that the position of a dot can be specified as any two numbers one may choose, this of course implies a frame of reference called the basis, and also one local to the dot in question. With respect to both, one can quantify position, and other such characteristics and properties. As a consequence of this observation, it would seem that you need n*2 data structures, where n is the amount of dots in the environment, that
contain the sorted values relative to each dot, and quite frankly it is unclear whether or not this approach would even work or is optimal. You state the design and programmatic constraint that the solution shall not compute the distances from each dot to each food source. But to achieve this, one must implement other such procedures, in order that we derive the correct results. These comments are made in relation to my discussion on efficiency. Therefore, you may be better of simply calculating the distance in each case. This is somewhat elegant.

Haar cascade for face detection xml file code explanation OpenCV

I am performing face detection using opencv haar cascade.
I wanted to know the explanation of the xml code of the haar cascade that I have included in my program. Can someone help me to understand the values presented in the XML file, for instance: weakcount, maxcount, threshold, internal nodes, leaf values etc.
I have made use of the haarcascade_frontalface_alt2.xml file. I have already performed face detection. Currently I am working on counting the number of faces detected.
As I understand, in general you already know about haarcascade structure and OpenCV implementation of it. If no, please first look into OpenCV manual and read something about cascade of boosted trees, for example, Lienhart's paper.
Now about xml structure itself.
<maxWeakCount>3</maxWeakCount>
This parameter describe amount of simple classifiers (trees) at the stage.
<stageThreshold>3.5069230198860168e-01</stageThreshold>
It is stage threshold, i. e. threshold score for exiting from cascade at the stage. During all stages, we compute final score from trees, and when final score is less then threshold, we exit from entire cascade and consider result as non-object.
<weakClassifiers>
Start of trees parameters in the stage.
<_>
<internalNodes>
0 1 0 4.3272329494357109e-03 -1 -2 1 1.3076160103082657e-02
</internalNodes>
<leafValues>
3.8381900638341904e-02 8.9652568101882935e-01 2.6293140649795532e-01
</leafValues>
</_>
This is tree description. internalNodesparameter contains the following:
0 1 or 1 0 defines leaf index in current node where we should go. In a first case we go to the left if value is below threshold and to the right if above, and in a second case we go to the right leaf if value is above threshold.
feature index
threshold for choosing leaf
there is one more -1 -2 1 ... parameters list - as I see from OpenCV sources, it is just another node with leaf indexes, but negative values are ignored according to evaluation code (also from OpenCV sources).
Consider cascade evaluation code:
do
{
CascadeClassifierImpl::Data::DTreeNode& node = cascadeNodes[root + idx];
double val = featureEvaluator(node.featureIdx);
idx = val < node.threshold ? node.left : node.right;
}
while( idx > 0 );
leafValues contains left value (i. e. left leaf score), right value (right leaf score) and tree threshold.
<_>
<rects>
<_>
6 3 1 9 -1.</_>
<_>
6 6 1 3 3.</_></rects></_>
<_>
It is feature description itself according to HAAR paradigm. Feature index from previous section describes index of rects pair.

Finding shortest circuit in a graph that visits X nodes at least once

Even though I'm still a beginner, I love solving graph related problems (shortest path, searches, etc). Recently I faced a problem like this :
Given a non-directed, weighted (no negative values) graph with N nodes and E edges (a maximum of 1 edge between two nodes, an edge can only be placed between two different nodes) and a list of X nodes that you must visit, find the shortest path that starts from node 0, visits all X nodes and returns to node 0. There's always at least one path connecting any two nodes.
Limits are 1 <= N <= 40 000 / 1 <= X <= 15 / 1 <= E <= 50 000
Here's an example :
The red node ( 0 ) should be the start and finish of the path. You must visit all blue nodes (1,2,3,4) and return. The shortest path here would be :
0 -> 3 -> 4 -> 3 -> 2 -> 1 -> 0 with a total cost of 30
I thought about using Dijkstra to find the shortest path between all X (blue) nodes and then just greedy picking the closest unvisited X (blue) node, but it doesn't work (comes up with 32 instead of 30 on paper). Also I later noticed that just finding the shortest path between all pairs of X nodes will take O(X*N^2) time which is too big with so much nodes.
The only thing I could find for circuits was Eulerian circuit that only allows visiting each node once (and I don't need that). Is this solveable with Dijkstra or is there any other algorithm that could solve this?
Here is a solution which likely to be fast enough:
1)Run shortest path search algorithm from every blue node(this can be done in O(X * (E log N))) to compute pairwise distances.
2)Build a new graph with zero vertex and blue vertices only(X + 1 vertices). Add edges using pairwise distances computed during the first step.
3)The new graph is small enough to use dynamic programming solution for TSP(it has O(X^2 * 2^X) time complexity).

My neural net learns sin x but not cos x

I have build my own neural net and I have a weird problem with it.
The net is quite a simple feed-forward 1-N-1 net with back propagation learning. Sigmoid is used as activation function.
My training set is generated with random values between [-PI, PI] and their [0,1]-scaled sine values (This is because the "Sigmoid-net" produces only values between [0,1] and unscaled sine -function produces values between [-1,1]).
With that training-set, and the net set to 1-10-1 with learning rate of 0.5, everything works great and the net learns sin-function as it should. BUT.. if I do everything exately the same way for COSINE -function, the net won't learn it. Not with any setup of hidden layer size or learning rate.
Any ideas? Am I missing something?
EDIT: My problem seems to be similar than can be seen with this applet. It won't seem to learn sine-function unless something "easier" is taught for the weights first (like 1400 cycles of quadratic function). All the other settings in the applet can be left as they initially are. So in the case of sine or cosine it seems that the weights need some boosting to atleast partially right direction before a solution can be found. Why is this?
I'm struggling to see how this could work.
You have, as far as I can see, 1 input, N nodes in 1 layer, then 1 output. So there is no difference between any of the nodes in the hidden layer of the net. Suppose you have an input x, and a set of weights wi. Then the output node y will have the value:
y = Σi w_i x
= x . Σi w_i
So this is always linear.
In order for the nodes to be able to learn differently, they must be wired differently and/or have access to different inputs. So you could supply inputs of the value, the square root of the value (giving some effect of scale), etc and wire different hidden layer nodes to different inputs, and I suspect you'll need at least one more hidden layer anyway.
The neural net is not magic. It produces a set of specific weights for a weighted sum. Since you can derive a set weights to approximate a sine or cosine function, that must inform your idea of what inputs the neural net will need in order to have some chance of succeeding.
An explicit example: the Taylor series of the exponential function is:
exp(x) = 1 + x/1! + x^2/2! + x^3/3! + x^4/4! ...
So if you supplied 6 input notes with 1, x1, x2 etc, then a neural net that just received each input to one corresponding node, and multiplied it by its weight then fed all those outputs to the output node would be capable of the 6-term taylor expansion of the exponential:
in hid out
1 ---- h0 -\
x -- h1 --\
x^2 -- h2 ---\
x^3 -- h3 ----- y
x^4 -- h4 ---/
x^5 -- h5 --/
Not much of a neural net, but you get the point.
Further down the wikipedia page on Taylor series, there are expansions for sin and cos, which are given in terms of odd powers of x and even powers of x respectively (think about it, sin is odd, cos is even, and yes it is that straightforward), so if you supply all the powers of x I would guess that the sin and cos versions will look pretty similar with alternating zero weights. (sin: 0, 1, 0, -1/6..., cos: 1, 0, -1/2...)
I think you can always compute sine and then compute cosine externally. I think your concern here is why the neural net is not learning the cosine function when it can learn the sine function. Assuming that this artifact if not because of your code; I would suggest the following:
It definitely looks like an error in the learning algorithm. Could be because of your starting point. Try starting with weights that gives the correct result for the first input and then march forward.
Check if there is heavy bias in your learning - more +ve than -ve
Since cosine can be computed by sine 90 minus angle, you could find the weights and then recompute the weights in 1 step for cosine.

count of distinct acyclic paths from A[a,b] to A[c,d]?

I'm writing a sokoban solver for fun and practice, it uses a simple algorithm (something like BFS with a bit of difference).
now i want to estimate its running time ( O and omega). but need to know how to calculate count of acyclic paths from a vertex to another in a network.
actually I want an expression that calculates count of valid paths, between two vertices of a m*n matrix of vertices.
a valid path:
visits each vertex 0 or one times.
have no circuits
for example this is a valid path:
alt text http://megapic.ir/images/f1hgyp5yxcu8887kfvkr.png
but this is not:
alt text http://megapic.ir/images/wnnif13ir5gaqwvnwk9d.png
What is needed is a method to find count of all acyclic paths between the two vertices a and b.
comments on solving methods and tricks are welcomed.
Not a solution but maybe you can think this idea a bit further. The problem is that you'll need to calculate also the longest possible path to get all paths. The longest path problem is NP complete for general graphs, so it will get a very long time even for relative small graphs (8x8 and greater).
Imagine the start-vertex is in the top, left corner and the end vertex is in the lower, right corner of the matrix.
For a 1x2 matrix there is only 1 possible path
2x2 matrix => 2***1** paths => 2
3x2 matrix => 2***2** paths => 4
3x3 matrix => 2***4** + 2*2 paths => 12
3x4 matrix => 2***12** + 12 + 2 paths => 38
Everytime I combined the results from the previous calculation for the current number of paths. It could be that there is a close formular for such a planar graph, maybe there is even a lot of theory for that, but I am too stupid for that ...
You can use the following Java (sorry, I am not a c++ expert :-/) snippet to calculate possible paths for larger matrices:
public static void main(String[] args) {
new Main(3, 2).start();
}
int xSize;
int ySize;
boolean visited[][];
public Main(int maxX, int maxY) {
xSize = maxX;
ySize = maxY;
visited = new boolean[xSize][ySize];
}
public void start() {
// path starts in the top left corner
int paths = nextCell(0, 0);
System.out.println(paths);
}
public int nextCell(int x, int y) {
// path should end in the lower right corner
if (x == xSize - 1 && y == ySize - 1)
return 1;
if (x < 0 || y < 0 || x >= xSize || y >= ySize || visited[x][y]) {
return 0;
}
visited[x][y] = true;
int c = 0;
c += nextCell(x + 1, y);
c += nextCell(x - 1, y);
c += nextCell(x, y + 1);
c += nextCell(x, y - 1);
visited[x][y] = false;
return c;
}
=>
4x4 => 184
5x5 => 8512
6x6 => 1262816
7x7 (even this simple case takes a lot of time!) => 575780564
This means you could (only theoretically) compute all possible paths from any position of a MxM matrix to the lower, right corner and then use this matrix to quickly look up the number of paths. Dynamic programming (using previous calculated results) could speed up things a bit.
The general problem of counting the number of simple paths in a graph is #P complete. Some #P-complete problems have fully polynomial randomized approximation schemes, and some don't, but you claim not to be interested in an approximation. Perhaps there's a way to take advantage of the grid structure, as there is for computing the Tutte polynomial, but I don't have any ideas for how to do this.
There is a similar but less general problem on project Euler: http://projecteuler.net/index.php?section=problems&id=237
I think some of the solutions described in the forum there can be extended to solve your general case. It's a pretty difficult problem though, especially for your general case.
To get access to their forums, you first need to solve the problem. I won't post the answer here, nor link to a certain site that lists the answer, a site that you can easily find on google by searching for something really obvious.
This is an open question in Mathematics with direct application to chemistry and physics in using it to model polymer bonds. Some of the earliest work done on this was done during the Manhattan project (nuclear bomb WWII.)
It is better known as the Self Avoiding Walk problem.
I spent a summer at my university mathematics department researching a monte-carlo algorithm called the pivot algorithm to approximate the parameters of the asymptotic fit of the number of Self-Avoiding Walks of a given length n.
Please refer to Gordon Slade's excellent book titled "The Self Avoiding Walk" for extensive coverage of the types of techniques used to approach this problem thus far.
This is a very complex problem and I wonder what your motivation may be for considering it. Perhaps you can find a simpler model for what you want, because Self Avoiding Walks are not simple at all.
Would a matrix showing the edges work? Consider building a Matrix showing where the edges are,i.e. [a,b]=1 <=> a->b is an edge in the graph, 0 otherwise. Now, raise this Matrix to various powers to show how many ways exist to get between vertices using n steps and then sum them to get the result. This is just an idea of one way to solve the problem, there may be other ways to frame the problem.
I wonder if this belongs on MathOverflow, as another idea
True, that once you have a zero matrix you could stop exponentiating as in your case, there aren't many places to go after 3, but the paths from 1 to 3 would be the direct one and the one that goes through 2, so there are only a few matrices to add together before the all zero one is found.
I'd think there should be a way to compute a bound of say n^(n+1), where n is the number of vertices in the graph, that would indicate a stopping point as by that point, every vertex will have been visited once. I'm not sure how to get the cyclic paths out of the problem though, or can one assume that the graph is free of cycles?