Find median of coordinates to build kd tree (2D) - C++

Find median of coordinates to build kd tree (2D) - C++ - c++

I have a nearest neighbor problem in a 2D problem and I found out that kd-trees were the best solution.
I couldn't find a ready implementation for the structure I am working with, so I decided to create my own.
The structure I work with is:
struct Point{
int id;
double x;
double y;
};
I have nearly 100000 points, my question is: How to proceed to find the median point each time I want to partition my points, and how to define the left and right partitions in the same time?
Another question would be: Is there a more efficient way to proceed ? (The less time consuming possible).

How to compute median? Is there a more efficient way to proceed?
I will give one answer for both questions: You can use std::nth_element, like this:
std::vector<float> v; // data vector (global)
bool myfunction (int i,int j) { return (v[i]<v[j]); }
int find_median(std::vector<int> &v_i)
{
size_t n = v_i.size() / 2;
nth_element(v_i.begin(), v_i.begin()+n, v_i.end(), myfunction);
return v_i[n];
}
You can also check my question for more.
how to define the left and right partitions in the same time?
Every value less from the median, would belong in the left partition, while every value greater than the median, would belong in the right partition.
It's up to you to decide where the equal value to the median would go. Just pick left or right and remember your decision.

Related

How is binary search applicable here (since the values are not monotonic)?

I am solving a LeetCode problem Search in Rotated Sorted Array, in order to learn Binary Search better. The problem statement is:
There is an integer array nums sorted in ascending order (with distinct values). Prior to being passed to your function, nums is possibly rotated at an unknown pivot index. For example, [0,1,2,4,5,6,7] might be rotated at pivot index 3 and become [4,5,6,7,0,1,2]. Given the array nums after the possible rotation and an integer target, return the index of target if it is in nums, or -1 if it is not in nums.
With some online help, I came up with the solution below, which I mostly understand:
class Solution {
public:
int search(vector<int>& nums, int target) {
int l=0, r=nums.size()-1;
while(l<r) { // 1st loop; how is BS applicable here, since array is NOT sorted?
int m=l+(r-l)/2;
if(nums[m]>nums[r]) l=m+1;
else r=m;
}
// cout<<"Lowest at: "<<r<<"\n";
if(nums[r]==target) return r; //target==lowest number
int start, end;
if(target<=nums[nums.size()-1]) {
start=r;
end=nums.size()-1;
} else {
start=0;
end=r;
}
l=start, r=end;
while(l<r) {
int m=l+(r-l)/2;
if(nums[m]==target) return m;
if(nums[m]>target) r=m;
else l=m+1;
}
return nums[l]==target ? l : -1;
}
};
My question: Are we searching over a parabola in the first while loop, trying to find the lowest point of a parabola, unlike a linear array in traditional binary search? Are we finding the minimum of a convex function? I understand how the values of l, m and r change leading to the right answer - but I do not fully follow how we can be guaranteed that if(nums[m]>nums[r]), our lowest value would be on the right.

You actually skipped something important by “getting help”.
Once, when I was struggling to integrate something tricky for Calculus Ⅰ, I went for help and the advisor said, “Oh, I know how to do this” and solved it. I learned nothing from him. It took me another week of going over it (and other problems) myself to understand it sufficient that I could do it myself.
The purpose of these assignments is to solve the problem yourself. Even if your solution is faulty, you have learned more than simply reading and understanding the basics of one example problem someone else has solved.
In this particular case...
Since you already have a solution, let’s take a look at it: Notice that it contains two binary search loops. Why?
As you observed at the beginning, the offset shift makes the array discontinuous (not convex). However, the subarrays either side of the discontinuity remain monotonic.
Take a moment to convince yourself that this is true.
Knowing this, what would be a good way to find and determine which of the two subarrays to search?
Hints:
A binary search as ( n ⟶ ∞ ) is O(log n)
O(log n) ≡ O(2 log n)
I should also observe to you that the prompt gives as example an arithmetic progression with a common difference of 1, but the prompt itself imposes no such restriction. All it says is that you start with a strictly increasing sequence (no duplicate values). You could have as input [19 74 512 513 3 7 12].
Does the supplied solution handle this possibility?
Why or why not?

(effectively) storing a polynomial dynamically

What i am trying to accomplish is to store an unknown size of a polynomial using arrays.
What i have seen over the internet is using an array that each cell contains the coeffecient and the degree is the cell number, but that is not effecient because what if we have a polynomial like : 6x^14+x+5. this would mean we would have zeros all throughout the cells from 1 till 13.Ive already looked at some solutions with vectors and linked lists but is there any other way to effectively tackle this problem, without the use of (std::vectors or std::list)?

Unless there is a compelling reason to act otherwise (this is a programming assignment where you are required to use C-style arrays), you should use a std::vector from the standard library. Libraries are there for a reason: to make your life easier. The overhead is probably insignificant in the context of your program.
You mention that storing a polynomial (such as 4*x^5 + x - 1) in an std::vector with the indices representing the power (such as [-1, 1, 0, 0, 0, 4]) is inefficient. This is true, but unless you are storing polynomials of degree greater than 1000, this waste is entirely insignificant. For "sparse" polynomials, of high degree but with few coefficients, you could consider using a vector of pairs, with the first value of each pair storing the power and the second value storing the coefficient.

A sparse polynomial can be represented with a map, where a zero element is represented by nonexistent key. Here is an example of such class:
#include <map>
//example of sparse integer polynomial
class SparsePolynomial{
std::map<int,int> coeff;
int& operator[](const int& degree);
int get(int degree);
void update(int degree, int val);
};
Whenever you try to get or update the coefficient of an element, its existence in the map is evaluated. Everytime the coefficient of an element is updated, it is checked whether the value is zero. Hence, the size of the map can always be minimal.
We can replace these two methods with operator[]. However, in that case, we would not be able to check for zero during an update operation, thus the storage would not be as efficient as using two separate methods for access and update.
int SparsePolynomial::get(int degree){
if (coeff.find(degree) == coeff.end()){
return 0;
}else{
return coeff[degree];
}
}
void SparsePolynomial::update(int degree, int val){
if (val == 0){
std::map<int,int>::iterator it = coeff.find(degree);
if (it!=coeff.end()){
coeff.erase(it);
}
}else{
coeff[degree]=val;
}
}
While this method gives us a more efficient storage, it requires more time for access and update than vector does. However, in the case of a sparse polynomial, the difference can be small. Given a std::map of size N, the average search complexity of an element is O(log N). Suppose you have a sparse polynomial with degree d and number of non-zero coefficients N. If N is much smaller than d, then the access and update time would be small enough not to notice.

Is there a find () of a map to use a comparator with parameters?

To explain, what I want, let there is a map
std::map<Point, SomeClass> hm_map;
Is there a way to use in a find () of a map the comparator with parameter? - the radius of interesting me Point, so the find () must return the set of proper pairs. I think, I have chosen the incorrect container for it.
Edit:
comparator::distance = someNumber;
setOfProperPairs = hmap.find (key, comparator);
where
struct comparator
{
static double distance;
bool operator()(Point ptg, Point p) const
{ return ptg.Hit (p, distance); }
};
Do you know what the container to use for it?

std::map supports one-dimensional sorted data.
If you want geometric sorting in two or more dimensions, there is no std support for that.
You will want a quad tree (or oct tree or higher dimensional analogues), or an r tree, or a kd-tree, or similar.
They are a bit tricky to code.
Now, if you know the radius you care about before you build your structure, you can hack a simpler implementation. Create a square n dimensional grid where the spacing between the grid sides is about 1/2 to 2/3 said radius.
Store data in a multimap from grid cell to exact location and data.
Now when doing a lookup, figure out what grid the center is in, work out what grid cells could have hits in them, and search through said grid cells, doing a final check on the location to see if it is a hit.

Finding all possible pairs of subsets using recursion

I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.

I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.

C++ algorithm to find 'maximal difference' in an array

I am asking for your ideas regarding this problem:
I have one array A, with N elements of type double (or alternatively integer). I would like to find an algorithm with complexity less than O(N2) to find:
max A[i] - A[j]
For 1 < j <= i < n. Please notice that there is no abs(). I thought of:
dynamic programming
dichotomic method, divide and conquer
some treatment after a sort keeping track of indices
Would you have some comments or ideas? Could you point at some good ref to train or make progress to solve such algorithm questions?

Make three sweeps through the array. First from j=2 up, filling an auxiliary array a with minimal element so far. Then, do the sweep from the top i=n-1 down, filling (also from the top down) another auxiliary array, b, with maximal element so far (from the top). Now do the sweep of the both auxiliary arrays, looking for a maximal difference of b[i]-a[i].
That will be the answer. O(n) in total. You could say it's a dynamic programming algorithm.
edit: As an optimization, you can eliminate the third sweep and the second array, and find the answer in the second sweep by maintaining two loop variables, max-so-far-from-the-top and max-difference.
As for "pointers" about how to solve such problems in general, you usually try some general methods just like you wrote - divide and conquer, memoization/dynamic programming, etc. First of all look closely at your problem and concepts involved. Here, it's maximum/minimum. Take these concepts apart and see how these parts combine in the context of the problem, possibly changing order in which they're calculated. Another one is looking for hidden order/symmetries in your problem.
Specifically, fixing an arbitrary inner point k along the list, this problem is reduced to finding the difference between the minimal element among all js such that 1<j<=k, and the maximal element among is: k<=i<n. You see divide-and-conquer here, as well as taking apart the concepts of max/min (i.e. their progressive calculation), and the interaction between the parts. The hidden order is revealed (k goes along the array), and memoization helps save the interim results for max/min values.
The fixing of arbitrary point k could be seen as solving a smaller sub-problem first ("for a given k..."), and seeing whether there is anything special about it and it can be abolished - generalized - abstracted over.
There is a technique of trying to formulate and solve a bigger problem first, such that an original problem is a part of this bigger one. Here, we think of find all the differences for each k, and then finding the maximal one from them.
The double use for interim results (used both in comparison for specific k point, and in calculating the next interim result each in its direction) usually mean some considerable savings. So,
divide-and-conquer
memoization / dynamic programing
hidden order / symmetries
taking concepts apart - seeing how the parts combine
double use - find parts with double use and memoize them
solving a bigger problem
trying arbitrary sub-problem and abstracting over it

This should be possible in a single iteration. max(a[i] - a[j]) for 1 < j <= i should be the same as max[i=2..n](a[i] - min[j=2..i](a[j])), right? So you'd have to keep track of the smallest a[j] while iterating over the array, looking for the largest a[i] - min(a[j]). That way you only have one iteration and j will be less than or equal to i.

You just need go over the array find the max and min then get the difference, so the worst case is linear time . If the array is sorted, you can find the diff in constant time, or do I miss something?

Java implementation runs in linear time
public class MaxDiference {
public static void main(String[] args) {
System.out.println(betweenTwoElements(2, 3, 10, 6, 4, 8, 1));
}
private static int betweenTwoElements(int... nums) {
int maxDifference = nums[1] - nums[0];
int minElement = nums[0];
for (int i = 1; i < nums.length; i++) {
if (nums[i] - minElement > maxDifference) {
maxDifference = nums[i] - minElement;
}
if (nums[i] < minElement) {
minElement = nums[i];
}
}
return maxDifference;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js