Find the summation of forces between all possible pairs of points? - c++

There are n points with each having two attributes:
1. Position (from axis)
2. Attraction value (integer)
Attraction force between two points A & B is given by:
Attraction_force(A, B) = (distance between them) * Max(Attraction_val_A, Attraction_val_B);
Find the summation of all the forces between all possible pairs of points?
I tried by calculating and adding forces between all the pairs
for(int i=0; i<n-1; i++) {
for(int j=i+1; j<n; j++) {
force += abs(P[i].pos - P[j].pos) * max(P[i].attraction_val, P[j].attraction_val);
}
}
Example:
Points P1 P2 P3
Points distance: 2 3 4
Attraction Val: 4 5 6
Force = abs(2 - 3) * max(4, 5) + abs(2 - 4) * max(4, 6) + abs(3 - 4) * max(5, 6) = 23
But this takes O(n^2) time, I can't think of a way to reduce it further!!

Scheme of a solution:
Sort all points by their attraction value and process them one-by-one, starting with the one with lowest attraction.
For each point you have to quickly calculate sum of distances to all previously added points. That can be done using any online Range Sum Query problem solution, like segment tree or BIT. Key idea is that all points to the left are really not different and sum of their coordinates is enough to calculate sum of distances to them.
For each newly added point you just multiply that sum of distances (obtained on step 2) by point's attraction value and add that to the answer.
Intuitive observations that I made in order to invent this solution:
We have two "bad" functions here (somewhat "discrete"): max and modulo (in distance).
We can get rid of max by sorting our points and processing them in a specific order.
We can get rid of modulo if we process points to the left and to the right separately.
After all these transformations, we have to calculate something which, after some simple algebraic transformations, converts to an online RSQ problem.

An algorithm of:
O(N2)
is optimal, because you need the actual distance between all possible pairs.

Related

Given n points, how can I find the number of points with given distance

I have an input of n unique points (X,Y) that are between 0 and 2^32 inclusive. The coordinates are integers.
I need to create an algorithm that finds the number of pairs of points with a distance of exactly 2018.
I have thought of checking with every other point but it would be O(n^2) and I have to make it more efficient. I also thought of using a set or a vector and sort it using a comparator based on the distance with the origin point but it wouldn't help at all.
So how can I do it efficiently?
There is one Pythagorean triple with the hypotenuse of 2018: 11182+16802=20182.
Since all coordinates are integers, the only possible differences between the coordinates (both X an Y) of the two points are 0, 1118, 1680, and 2018.
Finding all pairs of points with a given difference between X (or Y) coordinates is a simple n log n operation.
Numbers other than 2018 might need a bit more work because they might be members of more than one Pythagorean triple (for example 2015 is a hypotenuse of 3 triples). If the number is not given as a constant, but provided at run time, you will have to generate all triples with this hypotenuse. This may require some sqrt(N) effort (N is the hypotenuse, not the number of points). One can find a recipe on the math stackexchange, e.g. here (there are many others).
You could try using a Quadtree. First you start sorting your points into the quadtree. You should specify a lower limit for the cell size of e.g. 2048 wich is a power of 2. Then iterate though the points and calculate distances to the points in the same cell and to the points in adjacent cells. That way you should be able to decrease the number of distance calculations drastically.
The main difficulty will probably be implementing the tree structure. You also have to find a way to find adjacent cells (you must include the possibility to traverse upwards in the tree)
The complexity of this is probably O(n*log(n)) in the best case but don't pin me down on that.
One additional word on the distance calculation: You are probably much faster if you don't do
dx = p1x - p2x;
dy = p1y - p2y;
if ( sqrt(dx*dx + dy*dy) == 2018 ) {
...
}
but
dx = p1x - p2x;
dy = p1y - p2y;
if ( dx*dx + dy*dy == 2018*2018 ) {
...
}
Squaring is faster than taking the sqare root. So just compare the square of the distance with the square of 2018.

Given N lines on a Cartesian plane. How to find the bottommost intersection of lines efficiently?

I have N distinct lines on a cartesian plane. Since slope-intercept form of a line is, y = mx + c, slope and y-intercept of these lines are given. I have to find the y coordinate of the bottommost intersection of any two lines.
I have implemented a O(N^2) solution in C++ which is the brute-force approach and is too slow for N = 10^5. Here is my code:
int main() {
int n;
cin >> n;
vector<pair<int, int>> lines(n);
for (int i = 0; i < n; ++i) {
int slope, y_intercept;
cin >> slope >> y_intercept;
lines[i].first = slope;
lines[i].second = y_intercept;
}
double min_y = 1e9;
for (int i = 0; i < n; ++i) {
for (int j = i + 1; j < n; ++j) {
if (lines[i].first ==
lines[j].first) // since lines are distinct, two lines with same slope will never intersect
continue;
double x = (double) (lines[j].second - lines[i].second) / (lines[i].first - lines[j].first); //x-coordinate of intersection point
double y = lines[i].first * x + lines[i].second; //y-coordinate of intersection point
min_y = min(y, min_y);
}
}
cout << min_y << endl;
}
How to solve this efficiently?
In case you are considering solving this by means of Linear Programming (LP), it could be done efficiently, since the solution which minimizes or maximizes the objective function always lies in the intersection of the constraint equations. I will show you how to model this problem as a maximization LP. Suppose you have N=2 first degree equations to consider:
y = 2x + 3
y = -4x + 7
then you will set up your simplex tableau like this:
x0 x1 x2 x3 b
-2 1 1 0 3
4 1 0 1 7
where row x0 represents the negation of the coefficient of "x" in the original first degree functions, x1 represents the coefficient of "y" which is generally +1, x2 and x3 represent the identity matrix of dimensions N by N (they are the slack variables), and b represents the value of the idepent term. In this case, the constraints are subject to <= operator.
Now, the objective function should be:
x0 x1 x2 x3
1 1 0 0
To solve this LP, you may use the "simplex" algorithm which is generally efficient.
Furthermore, the result will be an array representing the assigned values to each variable. In this scenario the solution is:
x0 x1 x2 x3
0.6666666667 4.3333333333 0.0 0.0
The pair (x0, x1) represents the point which you are looking for, where x0 is its x-coordinate and x1 is it's y-coordinate. There are other different results that you could get, for an example, there could exist no solution, you may find out more at plenty of books such as "Linear Programming and Extensions" by George Dantzig.
Keep in mind that the simplex algorithm only works for positive values of X0, x1, ..., xn. This means that before applying the simplex, you must make sure the optimum point which you are looking for is not outside of the feasible region.
EDIT 2:
I believe making the problem feasible could be done easily in O(N) by shifting the original functions into a new position by means of adding a big factor to the independent terms of each function. Check my comment below. (EDIT 3: this implies it won't work for every possible scenario, though it's quite easy to implement. If you want an exact answer for any possible scenario, check the following explanation on how to convert the infeasible quadrants into the feasible back and forth)
EDIT 3:
A better method to address this problem, one that is capable of precisely inferring the minimum point even if it is in the negative side of either x or y: converting to quadrant 1 all of the other 3.
Consider the following generic first degree function template:
f(x) = mx + k
Consider the following generic cartesian plane point template:
p = (p0, p1)
Converting a function and a point from y-negative quadrants to y-positive:
y_negative_to_y_positive( f(x) ) = -mx - k
y_negative_to_y_positive( p ) = (p0, -p1)
Converting a function and a point from x-negative quadrants to x-positive:
x_negative_to_x_positive( f(x) ) = -mx + k
x_negative_to_x_positive( p ) = (-p0, p1)
Summarizing:
quadrant sign of corresponding (x, y) converting f(x) or p to Q1
Quadrant 1 (+, +) f(x)
Quadrant 2 (-, +) x_negative_to_x_positive( f(x) )
Quadrant 3 (-, -) y_negative_to_y_positive( x_negative_to_x_positive( f(x) ) )
Quadrant 4 (+, -) y_negative_to_y_positive( f(x) )
Now convert the functions from quadrants 2, 3 and 4 into quadrant 1. Run simplex 4 times, one based on the original quadrant 1 and the other 3 times based on the converted quadrants 2, 3 and 4. For the cases originating from a y-negative quadrant, you will need to model your simplex as a minimization instance, with negative slack variables, which will turn your constraints to the >= format. I will leave to you the details on how to model the same problem based on a minimization task.
Once you have the results of each quadrant, you will have at hands at most 4 points (because you might find out, for example, that there is no point on a specific quadrant). Convert each of them back to their original quadrant, going back in an analogous manner as the original conversion.
Now you may freely compare the 4 points with each other and decide which one is the one you need.
EDIT 1:
Note that you may have the quantity N of first degree functions as huge as you wish.
Other methods for solving this problem could be better.
EDIT 3: Check out the complexity of simplex. In the average case scenario, it works efficiently.
Cheers!

Finding median of a set of circular data

I would like to write a C++ function which finds the median of an array of circular data.
For example, consider the reading from a compass where the readings are assumed to be in [0,360). Though 1 & 359 appears to be far away, they are very close due to the circular nature of the reading.
Finding median of N-elements in ordinary data is as follows.
1. sort the data of N-elements (ascending or descending order)
2. If N is odd, median is the (N+1)/2 th element in the sorted array.
3. If N is even, median is the average of the N/2 th and N/2+1 th elements in the sorted array.
However, the wrap around problem in the circular data takes the problem to a different dimension and the solution non-trivial.
A similar question to find mean from circular data is explained here How do you calculate the average of a set of circular data?
The suggestion in the above link is to find the unit vector corresponding to each angle and find the average. However, median requires sorting the data and sorting of vectors don't make any sense in this context. Hence I don't think we can use the proposed scheme to find median!
I've actually given this topic way more thought than is healthy so I'll share my thoughts and findings here. Maybe someone will have a similar problem and find this useful.
I haven't used C++ in many years so please forgive me if I write all the code in C#. I believe a fluent C++ speaker can pretty easily translate the algorithms.
Circular mean
First, let's define the circular mean. It's calculated by converting your points to radians, where your period (256, 360 or whatever - the value that is interpreted to be the same as zero) is scaled to 2*pi. You then calculate the sine and cosine of those radian values. Those are the y and x coordinates of your values on a unit circle. You then sum up all the sines and cosines and calculate atan2. This gives you the average angle, which can be easily converted back to your data point by dividing with the scaling factor.
var scalingFactor = 2 * Math.PI / period;
var sines = 0.0;
var cosines = 0.0;
foreach (var value in inputs)
{
var radians = value * scalingFactor;
sines += Math.Sin(radians);
cosines += Math.Cos(radians);
}
var circularMean = Math.Atan2(sines, cosines) / scalingFactor;
if (circularMean >= 0)
return circularMean;
else
return circularMean + period;
Marginal circular median
The simplest approach to a circular median is just a modified way of handling the circular mean.
The circular median can be calculated in a similar way, by just finding the median of the sines and cosines instead of the sums, and calculating the atan2 of that. This way, you are finding the marginal median of the circle points and taking its angle as a result.
var scalingFactor = 2 * Math.PI / period;
var sines = new List<double>();
var cosines = new List<double>();
foreach (var value in inputs)
{
var radians = value * scalingFactor;
sines.Add(Math.Sin(radians));
cosines.Add(Math.Cos(radians));
}
var circularMedian = Math.Atan2(Median(sines), Median(cosines)) / scalingFactor;
if (circularMedian >= 0)
return circularMedian;
else
return circularMedian + period;
This approach is O(n), robust to outliers and very simple to implement. It may suit your purposes well enough, but it has a problem: rotating the input points will give you different results. Depending on the distribution of your input data, it may or may not be a problem.
Circular arc median
To understand this other approach, you need to stop thinking of means and medians in terms of "this is how it's calculated", but in terms of what the resulting values actually represent.
For non-cyclic data, you get the mean by summing up all the values and dividing by the number of elements. What this number represents, though, is the value with the minimal sum of all squared distances to data elements. (I hear statisticians call this value the L2 estimate of location, but a statistician should probably confirm or deny this.)
Likewise for median. You get it by finding the data element that would end up in the middle if all data were sorted (ideally, using an O(n) selection algorithm, like nth_element in C++). What this number is, though, is a value that has the minimal sum of all absolute (non-squared!) distances to data elements. (Supposedly, this value is called an L1 estimate of location.)
Sorting circular data doesn't help you find a middle, so the usual way of thinking about medians doesn't work, but you can still find this point that minimizes the sum of absolute distances from all data points. Here's the algorithm that I came up with, that runs in O(n) time assuming the input data is normalized to >= 0 and < period, and then sorted. (If you need to do this sorting as part of your calculation, then the runtime is O(n log n).)
It works by going through all the data points and keeping track of the sum of distances. When you shift to the right data point by a distance D, the sum of distances to all the left points increases by D*LeftCount and the sum of all distances to all the right points decreases by D*RightCount. Then, if some of the left points are now actually the right points, because their left distance is larger than period/2, you subtract their previous distance and add the new, correct distance.
For comparing the current sum to the best sum, I added a bit of tolerance to guard against inexact floating point arithmetic.
There may be multiple or infinitely many points that satisfy the minimum distances condition. With non-circular medians with even number of values, the median can be any value between the two central values. It's usually taken to be the average of those two central values, so I took the similar approach with this median algorithm. I find all data points that minimize the distances and then just calculate the circular mean of those points.
// Requires a sorted list with values normalized to [0,period).
// Doing an initialization pass:
// * candidate is the lowest number
// * finding the index where the circle with this candidate starts
// * calculating the score for this candidate - the sum of absolute distances
// * counting the number of values to the left of the candidate
int i;
var candidate = list[0];
var distanceSum = 0.0;
for (i = 1; i < list.Count; ++i)
{
if (list[i] >= candidate + period / 2)
break;
distanceSum += list[i] - candidate;
}
var leftCount = list.Count - i;
var circleStart = i;
if (circleStart == list.Count)
circleStart = 0;
else
for (; i < list.Count; ++i)
distanceSum += candidate + period - list[i];
var previousCandidate = candidate;
var bestCandidates = new List<double> { candidate };
var bestDistanceSum = distanceSum;
var equalityTolerance = period * 1e-10;
for (i = 1; i < list.Count; ++i)
{
candidate = list[i];
// A formula for correcting the distance given the movement to the right.
// It doesn't take into account that some values may have wrapped to the other side of the circle.
++leftCount;
distanceSum += (2 * leftCount - list.Count) * (candidate - previousCandidate);
// Counting all the values that wrapped to the other side of the circle
// and correcting the sum of distances from the candidate.
if (i <= circleStart)
while (list[circleStart] < candidate + period / 2)
{
--leftCount;
distanceSum += 2 * (list[circleStart] - candidate) - period;
++circleStart;
if (circleStart == list.Count)
{
circleStart = 0;
break; // Letting the next loop continue.
}
}
if (i > circleStart)
while (list[circleStart] < candidate - period / 2)
{
--leftCount;
distanceSum += 2 * (list[circleStart] - candidate) + period;
++circleStart;
}
// Comparing current sum to the best one, using the given tolerance.
if (distanceSum <= bestDistanceSum + equalityTolerance)
{
if (distanceSum >= bestDistanceSum - equalityTolerance)
{
// The numbers are close, so using their average as the next best.
bestDistanceSum = (bestCandidates.Count * bestDistanceSum + distanceSum) / (bestCandidates.Count + 1);
}
else
{
// The new number is significantly better, clearing.
bestDistanceSum = distanceSum;
bestCandidates.Clear();
}
bestCandidates.Add(candidate);
}
previousCandidate = candidate;
}
if (bestCandidates.Count == 1)
return bestCandidates[0];
else
return CircularMean(bestCandidates, period);
Geometric circular median
There is an inconsistency in the previous algorithm, in the way the median is defined in relation to the circular mean. The circular mean minimizes the sum of squared euclidian distances between points on a circle. In other words, it looks at the straight lines connecting points on a circle, cutting through the circle.
The arc median, as I calculate it above, looks at the arc distances: how far the points are to each other by moving on the perimeter of the circle, not by taking a straight line between them.
I have thought about how to address this issue, if it bothers you, but I haven't really done any experiments so I can't claim the following method works. In short, I believe you could use a modification of the Iteratively reweighted least squares algorithm (IRLS), which is what is usually used to calculate geometric medians.
The idea is to pick a starting value (for instance, the circular mean or the arc median presented above), and calculate the euclidean distance to each point: Di = sqrt(dxi^2 + dyi^2). Circular mean will minimize the squares of those distances, so the weights of each point should cancel out the square and reset to just D: Wi = Di / Di^2, which is just Wi = 1 / Di.
With these weights, calculate the weighted circular mean (same as the circular mean, but multiply each sine and cosine by the weight of that point before summing them up) and repeat the process. Repeat until enough iterations have passed or until the result stops changing much.
The problem with this algorithm is that it has a division by zero if the current solution falls exactly on a data point. Even if the distance isn't exactly zero, the solution will stop moving if you hit close enough to the point because the weight will become enormous compared to all the other ones. This can be fixed by adding a small fixed offset to the distance before dividing by it. This will make the solution suboptimal, but at least it won't stop on a wrong point.
It will still take some number of iterations to dig itself out of that wrong point unless the offset is relatively large, and the final solution is worse the bigger the offset is. So the best way would probably be to start with a fairly large offset and then progressively making it smaller for each next iteration.
Two properties of median allow inventing two distinct algorithms for median finding.
1) Median minimizes sum of absolute distance to all other elements -- O(n^2) algo:
for (i = 0; i < N; i++)
{
sum = 0;
for (j = 0; j < N; j++)
sum += abs(item[i] - item[j]) % 360;
if (sum < best_so_far) { best_so_far = sum; index = i; }
}
2) Median satisfies that half of items are less and half are greater
sort the items
locate the first set of items (i=0...I), satisfying either that
I <= N/2, OR item[I] > i + 180
if the condition for median is not satisfied, advance either i, or I.
requires O(N*log N) for sorting and O(N) for the next scan
Of course in cyclical data all items (and all items inbetween data points) can be a proper candidate for the median.
For definition and discussion of circular median see
N.I. Fisher's 'Statistical Analysis of Circular Data', Cambridge Univ. Press 1993
and the discussion surrounding equations 2.32 and 2.33. For multi-modal or isotropic data a unique median may not exist.
Find an axis that divides the data into 2 equal groups and choose the end of the axis at the smaller value of the angle. If the sample size is odd the median will be a data point, otherwise it will be the midpoint of 2 data points.
There are packages in other languages (e.g. R, MatLab) that would help provide test values for any function you write.
e.g.
https://www.rdocumentation.org/packages/circular/versions/0.4-93
See in particular median.circular and medianHL.circular
or
Berens, Philipp. ‘CircStat: A MATLAB Toolbox for Circular Statistics’. Journal of Statistical Software 31, no. 1 (23 September 2009): 1–21. https://doi.org/10.18637/jss.v031.i10.
and see circ_median
With your vector of angular datapoints (i.e. vector of numbers from 0 to 259), create two new vectors, I'll call them x and y. These two new vectors are the sine and cosine respectively of your angular datapoints.
That is, x[n] = cos(data[n]) and y[n] = sin(data[n]) where data is your angular data vector and n is however many datapoints there are.
Next, add up all the values in the x vector to get a single value, call it say sum_x and add up all the values in the y vector to get a another single value, call it sum_y.
Now you can do tangent-inverse (e.g. atan(sum_y/sum_x)) to get a new value. And this value is very meaningful. This value is basically telling you which direction your data is "pointing", i.e. where the majority of your data exists. NOTE: You must be careful of dividing by 0 (when sum_x=0) and when the indeterminate forms occurs (when both sum_x=0 and sum_y=0). The indeterminate form just means your data is evenly distributed, in which case the median is meaningless, and when sum_x=0 but sum_y!=0, then it is effectively atan(inf) or atan(-inf), both of which are known.
EDIT:
My previous answer needed some tweaking after this point.
From here, it is easy. Take the value you got in the previous step (atan(sum_y/sum_x)) and add 180 degrees to that value. This is your reference point of where your data starts and ends. From here, you can sort your angular data with this reference point as both the starting and ending point, and find the median of that data.
It is not possible to canonically extend the concept of median to circular data. For the sake of simplicity lets consider numbers in [0 10) and as an example the (already ordered) set { 1 3 5 7 8 }. Depending on how you rotate the array you get different values for the median:
1 3 5 7 8 -> 5
3 5 7 8 1 -> 7
5 7 8 1 3 -> 8
...etc...
and any is as good as the other.
I am not claiming that it is not possible to define a median on circular data. I am just claiming that the "normal" median cannot be extended to that case in a meaningful way without adding additional constraints or making an arbitrary choice.

C++: Finding all combinations of array items divisable to two groups

I believe this is more of an algorithmic question but I also want to do this in C++.
Let me illustrate the question with an example.
Suppose I have N number of objects (not programming objects), each with different weights. And I have two vehicles to carry them. The vehicles are big enough to carry all the objects by each. These two vehicles have their own mileage and different levels of fuel in the tank. And also the mileage depends on the weight it carries.
The objective is to bring these N objects as far as possible. So I need to distribute the N objects in a certain way between the two vehicles. Note that I do not need to bring them the 'same' distance, but rather as far as possible. So example, I want the two vehicles to go 5km and 6 km, rather than one going 2km and other going 7km.
I cannot think of a theoretical closed-form calculation to determine which weights to be loaded in to each vehicle. because remember that I need to carry all the N objects which is a fixed value.
So as far as I can think, I need to try all the combinations.
Could someone advice of an efficient algorithm to try all the combinations?
For example I would have the following:
int weights[5] = {1,4,2,7,5}; // can be more values than 5
float vehicelONEMileage(int totalWeight);
float vehicleTWOMileage(int totalWeight);
How could I efficiently try all the combinations of weights[] with the two functions?
Thw two functions can be assumed as linear functions. I.e. the return value of the two mileage functions are linear functions with (different) negative slopes and (different) offsets.
So what I need to find is something like:
MAX(MIN(vehicleONEMileage(x), vehicleTWOMileage(sum(weights) - x)));
Thank you.
This should be on the cs or the math site.
Simplification: Instead of an array of objects, let's say we can distribute weight linearly.
The function we want to optimize is the minimum of both travel distances. Finding the maximum of the minimum is the same as finding the maximum of the product (Without proof. But to see this, think of the relationship between perimeter and area of rectangles. The rectangle with the biggest area given a perimeter is a square, which also happens to have the largest minimum side length).
In the following, we will scale the sum of all weights to 1. So, a distribution like (0.7, 0.3) means that 70% of all weights is loaded on vehicle 1. Let's call the load of vehicle 1 x and the load of vehicle 1-x.
Given the two linear functions f = a x + b and g = c x + d, where f is the mileage of vehicle 1 when loaded with weight x, and g the same for vehicle 2, we want to maximize
(a*x+b)*(c*(1-x)+d)
Let's ask Wolfram Alpha to do the hard work for us: www.wolframalpha.com/input/?i=derive+%28%28a*x%2Bb%29*%28c*%281-x%29%2Bd%29%29
It tells us that there is an extremum at
x_opt = (a * c + a * d - b * c) / (2 * a * c)
That's all you need to solve your problem efficiently.
The complete algorithm:
find a, b, c, d
b = vehicleONEMileage(0)
a = (vehicleONEMileage(1) - b) * sum_of_all_weights
same for c and d
calculate x_opt as above.
if x_opt < 0, load all weight onto vehicle 2
if x_opt > 1, load all weight onto vehicle 1
else, try to load tgt_load = x_opt*sum_of_all_weights onto vehicle 1, the rest onto vehicle 2.
The rest is a knapsack problem. See http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem
How to apply this? Use the dynamic programming algorithm described there twice.
for maximizing a load up to tgt_load
for maximizing a load up to (sum_of_all_weights - tgt_load)
The first one, if loaded onto vehicle one, gives you a distribution with slightly less then expected on vehicle one.
The second one, if loaded onto vehicle two, gives you a distribution with slightly more than expected on vehicle two.
One of those is the best solution. Compare them and use the better one.
I leave the C++ part to you. ;-)
I can suggest the following solution:
The total number of combinations is 2^(number of weights). Using a bit logic we can loop through the all combinations and calculate maxDistance. Bits in the combination value show which weight goes to which vehicle.
Note that algorithm complexity is exponential and int has a limited number of bits!
float maxDistance = 0.f;
for (int combination = 0; combination < (1 << ARRAYSIZE(weights)); ++combination)
{
int weightForVehicleONE = 0;
int weightForVehicleTWO = 0;
for (int i = 0; i < ARRAYSIZE(weights); ++i)
{
if (combination & (1 << i)) // bit is set to 1 and goes to vechicleTWO
{
weightForVehicleTWO += weights[i];
}
else // bit is set to 0 and goes to vechicleONE
{
weightForVehicleONE += weights[i];
}
}
maxDistance = max(maxDistance, min(vehicelONEMileage(weightForVehicleONE), vehicleTWOMileage(weightForVehicleTWO)));
}

Generate Non-Degenerate Point Set in 2D - C++

I want to create a large set of random point cloud in 2D plane that are non-degenerate (no 3 points in a straight line in the whole set). I have a naive solution which generates a random float pair P_new(x,y) and checks with every pair of points (P1, P2) generated till now if point (P1, P2, P) lie in same line or not. This takes O(n^2) checks for each new point added to the list making the whole complexity O(n^3) which is very slow if I want to generate more than 4000 points (takes more than 40 mins).
Is there a faster way to generate these set of non-degenerate points?
Instead of checking the possible points collinearity on each cycle iteration, you could compute and compare coefficients of linear equations. This coefficients should be store in container with quick search. I consider using std::set, but unordered_map could fit either and could lead to even better results.
To sum it up, I suggest the following algorithm:
Generate random point p;
Compute coefficients of lines crossing p and existing points (I mean usual A,B&C). Here you need to do n computations;
Trying to find newly computed values inside of previously computed set. This step requires n*log(n^2) operations at maximum.
In case of negative search result, add new value and add its coefficients to corresponding sets. Its cost is about O(log(n)) too.
The whole complexity is reduced to O(n^2*log(n)).
This algorithm requires additional storing of n^2*sizeof(Coefficient) memory. But this seems to be ok if you are trying to compute 4000 points only.
O(n^2 log n) algorithm can be easily constructed in the following way:
For each point P in the set:
Sort other points by polar angle (cross-product as a comparison function, standard idea, see 2D convex hull gift-wrapping algorithm for example). In this step you should consider only points Q that satisfy
Q.x > P.x || Q.y >= P.y
Iterate over sorted list, equal points are lying on the same line.
Sorting is done in O(n log n), step 2. is O(n). This gives O(n^2 log n) for removing degenerate points.
Determining whether a set of points is degenerate is a 3SUM-hard problem. (The very first problem listed is determining whether three lines contains a common point; the equivalent problem under projective duality is whether three points belong to a common line.) As such, it's not reasonable to hope that a generate-and-test solution will be significantly faster than n2.
What are your requirements for the distribution?
generate random point Q
for previous points P calculate (dx, dy) = P - Q
and B = (asb(dx) > abs(dy) ? dy/dx : dx/dy)
sort the list of points P by its B value, so that points that form a line with Q will be in near positions inside the sorted list.
walk over the sorted list checking where Q forms a line with the current P value being considered and some next values that are nearer than a given distance.
Perl implementation:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Math::Vector::Real;
use Math::Vector::Real::Random;
use Sort::Key::Radix qw(nkeysort);
use constant PI => 3.14159265358979323846264338327950288419716939937510;
#ARGV <= 2 or die "Usage:\n $0 [n_points [tolerance]]\n\n";
my $n_points = shift // 4000;
my $tolerance = shift // 0.01;
$tolerance = $tolerance * PI / 180;
my $tolerance_arctan = 3 / 2 * $tolerance;
# I got to that relation using no so basic maths in a hurry.
# it may be wrong!
my $tolerance_sin2 = sin($tolerance) ** 2;
sub cross2d {
my ($p0, $p1) = #_;
$p0->[0] * $p1->[1] - $p1->[0] * $p0->[1];
}
sub line_p {
my ($p0, $p1, $p2) = #_;
my $a0 = $p0->abs2 || return 1;
my $a1 = $p1->abs2 || return 1;
my $a2 = $p2->abs2 || return 1;
my $cr01 = cross2d($p0, $p1);
my $cr12 = cross2d($p1, $p2);
my $cr20 = cross2d($p2, $p0);
$cr01 * $cr01 / ($a0 * $a1) < $tolerance_sin2 or return;
$cr12 * $cr12 / ($a1 * $a2) < $tolerance_sin2 or return;
$cr20 * $cr20 / ($a2 * $a0) < $tolerance_sin2 or return;
return 1;
}
my ($c, $f1, $f2, $f3) = (0, 1, 1, 1);
my #p;
GEN: for (1..$n_points) {
my $q = Math::Vector::Real->random_normal(2);
$c++;
$f1 += #p;
my #B = map {
my ($dx, $dy) = #{$_ - $q};
abs($dy) > abs($dx) ? $dx / $dy : $dy / $dx;
} #p;
my #six = nkeysort { $B[$_] } 0..$#B;
for my $i (0..$#six) {
my $B0 = $B[$six[$i]];
my $pi = $p[$six[$i]];
for my $j ($i + 1..$#six) {
last if $B[$six[$j]] - $B0 > $tolerance_arctan;
$f2++;
my $pj = $p[$six[$j]];
if (line_p($q - $pi, $q - $pj, $pi - $pj)) {
$f3++;
say "BAD: $q $pi-$pj";
redo GEN;
}
}
}
push #p, $q;
say "GOOD: $q";
my $good = #p;
my $ratiogood = $good/$c;
my $ratio12 = $f2/$f1;
my $ratio23 = $f3/$f2;
print STDERR "gen: $c, good: $good, good/gen: $ratiogood, f2/f1: $ratio12, f3/f2: $ratio23 \r";
}
print STDERR "\n";
The tolerance indicates the acceptable error in degrees when considering if three points are in a line as π - max_angle(Q, Pi, Pj).
It does not take into account the numerical instabilities that can happen when subtracting vectors (i.e |Pi-Pj| may be several orders of magnitude smaller than |Pi|). An easy way to eliminate that problem would be to also require a minimum distance between any two given points.
Setting tolerance to 1e-6, the program just takes a few seconds to generate 4000 points. Translating it to C/C++ would probably make it two orders of magnitude faster.
O(n) solution:
Pick a random number r from 0..1
The point added to the cloud is then P(cos(2 × π × r), sin(2 × π × r))