Vector on upper half of hemisphere - c++

I have a normal vector N, which defines the upper half of an hemisphere and an function, which creates random points P on the hemisphere.
Now I want to know, if the randomly choosen point is on the upper half. Is it save to assume, if the length of N+P is greater or equal 1, P is on the upper half, or is there a better way to calculate this in glm?

#Raxvan gave a perfectly valid answer how to do it properly: use dot product and check if it is positive (non-negative).
Answering your original idea that you also re-stated in the comments:
if the length of N+P is greater or equal 1, P is on the upper half
this is an incorrect way. Yes this test returns "true" for all the correct points but it does not filter out all the incorrect points. For example, consider N is (0,0,1) (i.e. vector along Z-axis) and P is (0.99, 0, -0.14) (i.e. a vector just a bit below the XY-plane and at the far end along the X-axis). Obviously P is not in the "upper hemisphere" but N + P is (0.99, 0, 0.86) and its length is obviously more than 1.

Related

Ternary Search to find point in in array where the difference is minimum

Let A an array of n positive integers.
How can I find some index k of A such that:
left = A[0] + A[1] + ... + A[k]
right = A[k+1] + A[k+2] + ... + A[n]
have the minimum absolute difference (that is, abs(left - right) is minimum) ?
As the absolute function of this difference is parabolic (decreases until the minimum difference and then increases, like an U ), I heard that Ternary Search is used to find values in functions like this (parabolic), but I don't know how to implement it, since I've searched over the internet and didn't find uses of Ternary Search over parabolic functions.
EDIT: suppose I have all intervals sum in O(1), and I need something faster than O(n) otherwise I wouldn't need Ternary Search..
Let left(k) represent the sum of the values in the array, from A[0] through A[k]. It is trivial to prove that:
left(k+1)=left(k)+A[k+1]
That it, if you already computed your left for the given k, then left for k+1 is computed by adding the next element to left.
In other words:
If you iterate over the array, from element #0 to element #n-1 (where n is the size of the array), you can compute the running total for left simply by adding the next element in the array to left.
This might seem to be obvious and self-evident, but it helps to state this formally in order for the next step in the process to become equally obvious.
In the same fashion, given right(k) representing the sum of the values in the array starting with element #k, until the last element in the array, you can also prove the following:
right(k+1)=right(k)-A[k]
So, you can find the k with the minimum difference between left(k) and right(k+1) (I'm using a slightly different notation than your question uses, because my notation is more convenient) by starting with the sum total of all values in the array as right(0) and A[0] as left(0), then computing right(1), then, proceed to iterate from the beginning to the array to the end, calculating both left and right on each step, on the fly, and computing the difference between the left and the right values. Finding where the difference is the minimum becomes trivial.
I can't think of any other way to do this, in less than O(n):
1) Computing the sum total of all values in the array, for the initial value of right(0) is O(n).
2) The iteration over the right is, of course, O(n).
I don't believe that a logarithmic binary search will work here, since the values abs(left(k)-right(k)) themselves are not going to be in sorted order.
Incidentally, with this approach, you can also find the minimum difference when the array contains negative values too. The only difference is that since the difference is no longer parabolic, you simply have to iterate over the entire array, and just keep track of where abs(left-right) is the smallest.
Trivial approach:
Compute all the sums A[0] + A[1] + ... + A[k] and A[k+1] + A[k+2] + ... + A[n] for any k<=n.
Search for the k minimising abs(left - right) for any k<=n
O(n) in space and time.
Edit:
Computing all the sums can be done in O(n) with an incremental approach.

Largest rectangles in histogram

I'm working on the below algorithm puzzle and here is the detailed problem statement.
Find the largest rectangle of the histogram; for example, given histogram = [2,1,5,6,2,3], the algorithm should return 10.
I am working on the below version of code. My question is, I think i-nextTop-1 could be replaced by i-top, but in some test cases (e.g. [2,1,2]), they have different results (i-nextTop-1 always produces the correct results). I think logically they should be the same, and wondering in what situations i-nextTop-1 is not equal to i-top
class Solution {
public:
int largestRectangleArea(vector<int>& height) {
height.push_back(0);
int result=0;
stack<int> indexStack;
for(int i=0;i<height.size();i++){
while(!indexStack.empty()&&height[i]<height[indexStack.top()]){
int top=indexStack.top();
indexStack.pop();
int nextTop=indexStack.size()==0?-1:indexStack.top();
result=max((i-nextTop-1)*height[top],result);
}
indexStack.push(i);
}
return result;
}
};
The situations where i-nextTop-1 != i-top occur are when the following is true:
nextTop != top-1
This can be seen by simply rearranging terms in the inequality i-nextTop-1 != i-top.
The key to understanding when this occurs lies in the following line within your code, in which you define the value of nextTop:
int nextTop = indexStack.size() == 0 ? -1 : indexStack.top();
Here, you are saying that if indexStack is empty (following the pop() on the previous line of code), then set nextTop to -1; otherwise set nextTop to the current indexStack.top().
So the only times when nextTop == top-1 are when
indexStack is empty and top == 0, or
indexStack.top() == top - 1.
In those cases, the two methods will always agree. In all other situations, they will not agree, and will produce different results.
You can see what is happening by printing the values of i, nextTop, (i - top), (i - nextTop - 1), and result for each iteration at the bottom of the while loop. The vector {5, 4, 3, 2, 1} works fine, but { 1, 2, 3, 4, 5} does not, when replacing i-nextTop-1 with i-top.
Theory of the Algorithm
The outer for loop iterates through the histogram elements one at a time. Elements are pushed onto the stack from left to right, and upon entry to the while loop the top of stack contains the element just prior to (or just to the left of) the current element. (This is because the current element is pushed onto the stack at the bottom of the for loop, right before looping back to the top.)
An element is popped off the stack within the while loop, when the algorithm has determined that the best possible solution that includes that element has already been considered.
The inner while loop will keep iterating as long as height[i] < height[indexStack.top()], that is, as long as the the height of the current element is less than the height of the element on the top of the stack.
At the start of each iteration of the while loop, the elements on the stack represent all of the contiguous elements to the immediate left of the current element, that are larger than the current element.
This allows the algorithm to calculate the area of the largest rectangle to the left of and including the current element. This calculation is done in the following two lines of code:
int nextTop = indexStack.size() == 0 ? -1 : indexStack.top();
result = max((i - nextTop - 1) * height[top], result);
Variable i is the index of the current histogram element, and represents the rightmost edge of the rectangle being currently calculated.
Variable nextTop represents the index of the leftmost edge of the rectangle.
The expression (i - nextTop - 1) represents the horizontal width of the rectangle. height[top] is the vertical height of the rectangle, so the result is the product of these two terms.
Each new result is the larger of the new calculation and the previous value for result.

Pick a matrix cell according to its probability

I have a 2D matrix of positive real values, stored as follow:
vector<vector<double>> matrix;
Each cell can have a value equal or greater to 0, and this value represents the possibility of the cell to be chosen. In particular, for example, a cell with a value equals to 3 has three times the probability to be chosen compared to a cell with value 1.
I need to select N cells of the matrix (0 <= N <= total number of cells) randomly, but according to their probability to be selected.
How can I do that?
The algorithm should be as fast as possible.
I describe two methods, A and B.
A works in time approximately N * number of cells, and uses space O(log number of cells). It is good when N is small.
B works in time approximately (number of cells + N) * O(log number of cells), and uses space O(number of cells). So, it is good when N is large (or even, 'medium') but uses a lot more memory, in practice it might be slower in some regimes for that reason.
Method A:
The first thing you need to do is normalize the entries. (It's not clear to me if you assume they are normalized or not.) That means, sum all the entries and divide by the sum. (This part is potentially slow, so it's better if you assume or require that it already happened.)
Then you sample like this:
Choose a random [i,j] entry of the matrix (by choosing i,j each uniformly randomly from the range of integers 0 to n-1).
Choose a uniformly random real number p in the range [0, 1].
Check if matrix[i][j] > p. If so, return the pair [i][j]. If not, go back to step 1.
Why does this work? The probability that we end at step 3 with any particular output, is equal to, the probability that [i][j] was selected (this is the same for each entry), times the probality that the number p was small enough. This is proportional to the value matrix[i][j], so the sampling is choosing each entry with the correct proportions. It's also possible that at step 3 we go back to the start -- does that bias things? Basically, no. The reason is, suppose we arbitrarily choose a number k and then consider the distribution of the algorithm, conditioned on stopping exactly after k rounds. Conditioned on the assumption that we stop at the k'th round, no matter what value k we choose, the distribution we sample has to be exactly right by the above argument. Since if we eliminate the case that p is too small, the other possibilities all have their proportions correct. Since the distribution is perfect for each value of k that we might condition on, and the overall distribution (not conditioned on k) is an average of the distributions for each value of k, the overall distribution is perfect also.
If you want to analyze the number of rounds that typically needed in a rigorous way, you can do it by analyzing the probability that we actually stop at step 3 for any particular round. Since the rounds are independent, this is the same for every round, and statistically, it means that the running time of the algorithm is poisson distributed. That means it is tightly concentrated around its mean, and we can determine the mean by knowing that probability.
The probability that we stop at step 3 can be determined by considering the conditional probability that we stop at step 3, given that we chose any particular entry [i][j]. By the formulas for conditional expectation, you get that
Pr[ stop at step 3 ] = sum_{i,j} ( 1/(n^2) * Matrix[i,j] )
Since we assumed the matrix is normalized, this sum reduces to just 1/n^2. So, the expected number of rounds is about n^2 (that is, n^2 up to a constant factor) no matter what the entries in the matrix are. You can't hope to do a lot better than that I think -- that's about the same amount of time it takes to just read all the entries of the matrix, and it's hard to sample from a distribution that you cannot even read all of.
Note: What I described is a way to correctly sample a single element -- to get N elements from one matrix, you can just repeat it N times.
Method B:
Basically you just want to compute a histogram and sample inversely from it, so that you know you get exactly the right distribution. Computing the histogram is expensive, but once you have it, getting samples is cheap and easy.
In C++ it might look like this:
// Make histogram
typedef unsigned int uint;
typedef std::pair<uint, uint> upair;
typedef std::map<double, upair> histogram_type;
histogram_type histogram;
double cumulative = 0.0f;
for (uint i = 0; i < Matrix.size(); ++i) {
for (uint j = 0; j < Matrix[i].size(); ++j) {
cumulative += Matrix[i][j];
histogram[cumulative] = std::make_pair(i,j);
}
}
std::vector<upair> result;
for (uint k = 0; k < N; ++k) {
// Do a sample (this should never repeat... if it does not find a lower bound you could also assert false quite reasonably since it means something is wrong with rand() implementation)
while(1) {
double p = cumulative * rand(); // Or, for best results use std::mt19937 or boost::mt19937 and sample a real in the range [0,1] here.
histogram_type::iterator it = histogram::lower_bound(p);
if (it != histogram.end()) {
result.push_back(it->second);
break;
}
}
}
return result;
Here the time to make the histogram is something like number of cells * O(log number of cells) since inserting into the map takes time O(log n). You need an ordered data structure in order to get cheap lookup N * O(log number of cells) later when you do repeated sampling. Possibly you could choose a more specialized data structure to go faster, but I think there's only limited room for improvement.
Edit: As #Bob__ points out in comments, in method (B) a written there is potentially going to be some error due to floating point round-off if the matrices are quite large, even using type double, at this line:
cumulative += Matrix[i][j];
The problem is that, if cumulative is much larger than Matrix[i][j] beyond what the floating point precision can handle then these each time this statement is executed you may observe significant errors which accumulate to introduce significant inaccuracy.
As he suggests, if that happens, the most straightforward way to fix it is to sort the values Matrix[i][j] first. You could even do this in the general implementation to be safe -- sorting these guys isn't going to take more time asymptotically than you already have anyways.

Efficient data structure for sparse data lookup

Situation:
Given some points with coordinate (x, y)
Range 0 < x < 100,000,000 and 0 < y <100,000,000
I have to find smallest square which contains at least N no of points on its edge and inside it.
I used vector to store coordinates and searched all squares with side length minLength upto side length maxLength (Appling Brute Force in relevant space)
struct Point
{
int x;
int y;
};
vector<Point> P;
int minLength = sqrt(N) - 1;
int maxLength = 0;
// bigx= largest x coordinate of any point
// bigy= largest y coordinate of any point
// smallx= smallest x coordinate of any point
// smally= smallest y coordinate of any point
(bigx - smallx) < (bigy - smally) ? maxLength = (bigx - smallx) : maxLength = (bigy - smally);
For each square I looked up, traversed through complete vector to see if at least N points are on its edge and inside it.
This was quite time inefficient.
Q1. What data structure should I use to improve time efficiency without changing Algorithm I used?
Q2. Efficient Algorithm for this problem?
There are points on 2 opposite edges - if not, you could shrink the square by 1 and still contain the same number of points. That means the possible coordinates of the edges are limited to those of the input points. The input points are probably not on the corners, though. (For a minimum rectangle, there would be points on all 4 edges as you can shrink one dimension without altering the other)
The next thing to realize is that each point divides the plane in 4 quadrants, and each quadrant contains a number of points. (These can add up to more than the total number of points as the quadrants have one pixel overlap). Lets say that NW(p) is the number of points to the northwest of point p, i.e. those that have x>=px and y>=py. Then the number of points in a square is NW(bottomleft) + NW(topright) - NW(bottomright) - NW(topleft).
It's fairly easy to calculate NW(p) for all input points. Sort them by x and for equal x by y. The most northwestern point has NW(p)==0. The next point can have NW(p)==1 if it's to the southeast of the first point, else it has NW(p)==0. It's also useful to keep track of SW(p) in this stage, as you're working through the points from west to east and they're therefore not sorted north to south. Having calculated NW(p), you can determine the number of points in a square S in O(1)
Recall that the square size is restricted by by the need to have points on opposite edges. Assume the points are on the left (western) and right edge - you still have the points sorted by x order. Start by assuming the left edge is at your leftmost x coordinate, and see what the right edge must be to contain N points. Now shift the left edge to the next x coordinate and find a new right edge (and thus a new square). Do this until the right edge of the square is the rightmost point.
Its also possible that the square is constrained in y direction. Just sort the points in y direction and repeat, then choose the smallest square between the two outcomes.
Since you're running linearly through the points in x and y direction, that part is just O(N) and the dominant factor is the O(N log N) sort.
Look at http://en.wikipedia.org/wiki/Space_partitioning for algorithms that use the Divide-and-Conquer technique to solve this. This is definitely solvable in Polynomial time.
Another variant algorithms can be on the following lines.
Generate a vornoi-diagram on the points to get neighbour information. [ O(nlog(n)) ]
Now use Dynamic Programming, the DP will be similar to the problem of finding the maximum subarray in a 2D array. Here instead of the sum of numbers, you will keep count of points before it.
2.a Essentially a recursion similar to this will hold. [ O(n) ]
Number of elements in square from (0,0) to (x,y ) = (Number of elems
from square (0,0 to ((x-1),y))+ (Number of elems in square 0,0 - ( x, y-1))
- (Number of elems in (0,0)-((x-1),(y-1)))
Your recurrence will have to change for all the points on its neighbourhood and to the left and above, instead of just the points above and left as above.
Once the DP is ready, you can query the points in a sqare in O(1).
Another O(n^2) loop to find from all possible combinations and find the least square.
You can even greedily start from the smallest squares first, that way you can end your search as soon as you find a suitable square..
The rtree allows spatial searching, but doesn't have stl implementation, although sqlite would allow binding. This can answer "get all points within range", "k nearest neighbours"
Finding a region which has the most dense data, is a problem similar to clustering.
Iterating over the points and finding the N nearest entries to each point. Then generate the smallest circle - centre would be the Max(x) - min(x), Max(y) - min(y). A square can be formed which contains all the neighbours, and would be somewhere between 2r length and 2sqrt(r) length sides compared to circle.
Time taken O(x) to build structure
O(X N log(X)) to search for smallest cluster
Note: There are a bunch of answers for your second question (which will probably reap bigger benefits), but I'm only referring to your first one, i.e. what data to use without changing the algorithm.
There, I think that your choice using a vector is already pretty good, because in general vectors offer the best payload/overhead ratio and also the fastest iteration. In order to find out specific bottlenecks, use a profiler, otherwise you are only guessing. With large vectors, there are a few things to avoid though:
Overallocation, this wastes space.
Underallocation, this causes copying when the vector is grown to the necessary size.
Copying.

Find dominant mode of an unsorted array

Note, this is a homework assignment.
I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither.
That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).
Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.
Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.
To make some code of this:
int search(a,b) {
search(a, a+(b-a)/2);
search(a+(b-a)/2+1, b);
}
The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.
I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.
At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.
if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:
int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
if (a[i] == num) occ++;
else {
occ--;
if (occ < 0) {
num = a[i];
occ = 1;
}
}
}
since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.
If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:
def DominantMode(array):
# if there is only one element, that's the dominant mode
if len(array) == 1: return array[0]
# otherwise, find the dominant mode of the left and right halves
left = DominantMode(array[0:len(array)/2])
right = DominantMode(array[len(array)/2:len(array)])
# if both sides have the same dominant mode, the whole array has that mode
if left == right: return left
# otherwise, we have to scan the whole array to determine which one wins
leftCount = sum(element == left for element in array)
rightCount = sum(element == right for element in array)
if leftCount > len(array) / 2: return left
if rightCount > len(array) / 2: return right
# if neither wins, just return None
return None
The above algorithm is O(nlogn) time but only O(logn) space.
If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.
Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).
Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.