Generate Non-Degenerate Point Set in 2D - C++ - c++

I want to create a large set of random point cloud in 2D plane that are non-degenerate (no 3 points in a straight line in the whole set). I have a naive solution which generates a random float pair P_new(x,y) and checks with every pair of points (P1, P2) generated till now if point (P1, P2, P) lie in same line or not. This takes O(n^2) checks for each new point added to the list making the whole complexity O(n^3) which is very slow if I want to generate more than 4000 points (takes more than 40 mins).
Is there a faster way to generate these set of non-degenerate points?

Instead of checking the possible points collinearity on each cycle iteration, you could compute and compare coefficients of linear equations. This coefficients should be store in container with quick search. I consider using std::set, but unordered_map could fit either and could lead to even better results.
To sum it up, I suggest the following algorithm:
Generate random point p;
Compute coefficients of lines crossing p and existing points (I mean usual A,B&C). Here you need to do n computations;
Trying to find newly computed values inside of previously computed set. This step requires n*log(n^2) operations at maximum.
In case of negative search result, add new value and add its coefficients to corresponding sets. Its cost is about O(log(n)) too.
The whole complexity is reduced to O(n^2*log(n)).
This algorithm requires additional storing of n^2*sizeof(Coefficient) memory. But this seems to be ok if you are trying to compute 4000 points only.

O(n^2 log n) algorithm can be easily constructed in the following way:
For each point P in the set:
Sort other points by polar angle (cross-product as a comparison function, standard idea, see 2D convex hull gift-wrapping algorithm for example). In this step you should consider only points Q that satisfy
Q.x > P.x || Q.y >= P.y
Iterate over sorted list, equal points are lying on the same line.
Sorting is done in O(n log n), step 2. is O(n). This gives O(n^2 log n) for removing degenerate points.

Determining whether a set of points is degenerate is a 3SUM-hard problem. (The very first problem listed is determining whether three lines contains a common point; the equivalent problem under projective duality is whether three points belong to a common line.) As such, it's not reasonable to hope that a generate-and-test solution will be significantly faster than n2.
What are your requirements for the distribution?

generate random point Q
for previous points P calculate (dx, dy) = P - Q
and B = (asb(dx) > abs(dy) ? dy/dx : dx/dy)
sort the list of points P by its B value, so that points that form a line with Q will be in near positions inside the sorted list.
walk over the sorted list checking where Q forms a line with the current P value being considered and some next values that are nearer than a given distance.
Perl implementation:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Math::Vector::Real;
use Math::Vector::Real::Random;
use Sort::Key::Radix qw(nkeysort);
use constant PI => 3.14159265358979323846264338327950288419716939937510;
#ARGV <= 2 or die "Usage:\n $0 [n_points [tolerance]]\n\n";
my $n_points = shift // 4000;
my $tolerance = shift // 0.01;
$tolerance = $tolerance * PI / 180;
my $tolerance_arctan = 3 / 2 * $tolerance;
# I got to that relation using no so basic maths in a hurry.
# it may be wrong!
my $tolerance_sin2 = sin($tolerance) ** 2;
sub cross2d {
my ($p0, $p1) = #_;
$p0->[0] * $p1->[1] - $p1->[0] * $p0->[1];
}
sub line_p {
my ($p0, $p1, $p2) = #_;
my $a0 = $p0->abs2 || return 1;
my $a1 = $p1->abs2 || return 1;
my $a2 = $p2->abs2 || return 1;
my $cr01 = cross2d($p0, $p1);
my $cr12 = cross2d($p1, $p2);
my $cr20 = cross2d($p2, $p0);
$cr01 * $cr01 / ($a0 * $a1) < $tolerance_sin2 or return;
$cr12 * $cr12 / ($a1 * $a2) < $tolerance_sin2 or return;
$cr20 * $cr20 / ($a2 * $a0) < $tolerance_sin2 or return;
return 1;
}
my ($c, $f1, $f2, $f3) = (0, 1, 1, 1);
my #p;
GEN: for (1..$n_points) {
my $q = Math::Vector::Real->random_normal(2);
$c++;
$f1 += #p;
my #B = map {
my ($dx, $dy) = #{$_ - $q};
abs($dy) > abs($dx) ? $dx / $dy : $dy / $dx;
} #p;
my #six = nkeysort { $B[$_] } 0..$#B;
for my $i (0..$#six) {
my $B0 = $B[$six[$i]];
my $pi = $p[$six[$i]];
for my $j ($i + 1..$#six) {
last if $B[$six[$j]] - $B0 > $tolerance_arctan;
$f2++;
my $pj = $p[$six[$j]];
if (line_p($q - $pi, $q - $pj, $pi - $pj)) {
$f3++;
say "BAD: $q $pi-$pj";
redo GEN;
}
}
}
push #p, $q;
say "GOOD: $q";
my $good = #p;
my $ratiogood = $good/$c;
my $ratio12 = $f2/$f1;
my $ratio23 = $f3/$f2;
print STDERR "gen: $c, good: $good, good/gen: $ratiogood, f2/f1: $ratio12, f3/f2: $ratio23 \r";
}
print STDERR "\n";
The tolerance indicates the acceptable error in degrees when considering if three points are in a line as π - max_angle(Q, Pi, Pj).
It does not take into account the numerical instabilities that can happen when subtracting vectors (i.e |Pi-Pj| may be several orders of magnitude smaller than |Pi|). An easy way to eliminate that problem would be to also require a minimum distance between any two given points.
Setting tolerance to 1e-6, the program just takes a few seconds to generate 4000 points. Translating it to C/C++ would probably make it two orders of magnitude faster.

O(n) solution:
Pick a random number r from 0..1
The point added to the cloud is then P(cos(2 × π × r), sin(2 × π × r))

Related

Find the nearest point in each quadrant in a cartesian 2D space

I have N points in a 2D cartesian space loaded in a boost:rtree.
Given a random point P(x,y) not in the tree, I need to find an effective way to identify the nearest point for each of the four quadrant of generated by the local csys centered in P and parallel to the main csys
As shown in the image (linked above), given the red point I need to find the four purple points.
I tried this naive approach:
namespace bg = boost::geometry;
typedef bg::model::box<point> box;
vector<item> result_s;
vector<item> result_p;
int xres = 10; /*this is a fixed amount that is loosely related to the points distribution*/
int yres = 10; /*as for xres*/
int range = 10;
int maxp = 30;
/*
* .. filling the tree
*/
box query_box2(point(lat, lon), point(lat-range*yres, lon+range*xres));
rtree.query(bgi::intersects(query_box2) && bgi::nearest(p, maxp), std::back_inserter(result_p));
if(result_p.size()>0) result_s.push_back(result_p[0]);
result_p.clear();
box query_box1(point(lat, lon), point(lat+range*yres, lon+range*xres));
rtree.query(bgi::intersects(query_box1) && bgi::nearest(p, maxp), std::back_inserter(result_p));
if(result_p.size()>0) result_s.push_back(result_p[0]);
result_p.clear();
box query_box3(point(lat, lon), point(lat+range*yres, lon-range*xres));
rtree.query(bgi::intersects(query_box3) && bgi::nearest(p, maxp), std::back_inserter(result_p));
if(result_p.size()>0) result_s.push_back(result_p[0]);
result_p.clear();
box query_box4(point(lat, lon), point(lat-range*yres, lon-range*xres));
rtree.query(bgi::intersects(query_box4) && bgi::nearest(p, maxp), std::back_inserter(result_p));
if(result_p.size()>0) result_s.push_back(result_p[0]);
result_p.clear();
if(result_s.size()>3)
cout << "OK!" << endl;
else
cout << "KO" << endl;
but often it end up with an empty result (KO)
Any suggestion or address will be very appreciated.
Tnx.
I would do an iterated nearest query.
It will produce nearest points ordered by distance ascending.
You can cancel it after you received at least 1 point in all quadrants.
In principle the time complexity of this approach is MUCH lower because it involves only a single query.
Worst case behaviour would iterate all points in the tree e.g.
if one quadrant doesn't contain any points, or
when all the points in one quadrant are actually closer than the closest point in another quadrant.
Seems like the former might not be possible in your model (?) and the latter is statistically unlikely with normal distributions. You'd have to check your domains expected point distributions.
Or, and this always applies: MEASURE and compare the effective performance
Use a modified distance function. More precisely, use four.
The main idea is to use a distance such that
d(v1,v2) = infinity if v2.x < v1.x
d(v1,v2) = infinity if v2.y < v1.y
d(v1,v2) = (v1.x-v2.x)²+(v1.y-v2.y)² otherwise
If you search for the nearest point with this distance, it must be in the top right quadrant.
You'll need to extend this logic to minDist when searching the tree.
The benefit is that it can stop searching a quadrant when it has found a point. Pages that overlap the "axes" may be expanded twice though.

How to insert "k" number of points between two points in C++?

I have two curves with an equal number of data points. I want to connect the corresponding points on the curves with "k" equally spaced points to form a straight line.
How it should look
I have tried to use the following formula to calculate both x's and y's lying on the path between the points:
for(int j = 1; j<=num_k; j++) {
for (int i = 2; i <= (num_points-1); i++) {
x[i][j] = x[i][1] * (1. - j/num_k) + x[i][num_points] * j/num_k;
y[i][j] = y[i][1] * (1. - j/num_k) + y[i][num_points] * j/num_k;
}
}
The data points of the curve are stored in the first and last columns of 2D arrays x and z.
num_k is the number of intervals I want. num_points is the number of points on both the curves.
But this is not giving me the result that I need - this gives me the points, but they are not between the two input points as given. Am I using the right technique or is there something else I should be using? Also, are there any special cases?
Thanks!!
(1. - j/num_k) will almost always evaluate to 1, because j/num_k is done using integer math, which will mean it will be zero except on the last iteration.
Use (1. - double(j)/num_k) instead.

Finding median of a set of circular data

I would like to write a C++ function which finds the median of an array of circular data.
For example, consider the reading from a compass where the readings are assumed to be in [0,360). Though 1 & 359 appears to be far away, they are very close due to the circular nature of the reading.
Finding median of N-elements in ordinary data is as follows.
1. sort the data of N-elements (ascending or descending order)
2. If N is odd, median is the (N+1)/2 th element in the sorted array.
3. If N is even, median is the average of the N/2 th and N/2+1 th elements in the sorted array.
However, the wrap around problem in the circular data takes the problem to a different dimension and the solution non-trivial.
A similar question to find mean from circular data is explained here How do you calculate the average of a set of circular data?
The suggestion in the above link is to find the unit vector corresponding to each angle and find the average. However, median requires sorting the data and sorting of vectors don't make any sense in this context. Hence I don't think we can use the proposed scheme to find median!
I've actually given this topic way more thought than is healthy so I'll share my thoughts and findings here. Maybe someone will have a similar problem and find this useful.
I haven't used C++ in many years so please forgive me if I write all the code in C#. I believe a fluent C++ speaker can pretty easily translate the algorithms.
Circular mean
First, let's define the circular mean. It's calculated by converting your points to radians, where your period (256, 360 or whatever - the value that is interpreted to be the same as zero) is scaled to 2*pi. You then calculate the sine and cosine of those radian values. Those are the y and x coordinates of your values on a unit circle. You then sum up all the sines and cosines and calculate atan2. This gives you the average angle, which can be easily converted back to your data point by dividing with the scaling factor.
var scalingFactor = 2 * Math.PI / period;
var sines = 0.0;
var cosines = 0.0;
foreach (var value in inputs)
{
var radians = value * scalingFactor;
sines += Math.Sin(radians);
cosines += Math.Cos(radians);
}
var circularMean = Math.Atan2(sines, cosines) / scalingFactor;
if (circularMean >= 0)
return circularMean;
else
return circularMean + period;
Marginal circular median
The simplest approach to a circular median is just a modified way of handling the circular mean.
The circular median can be calculated in a similar way, by just finding the median of the sines and cosines instead of the sums, and calculating the atan2 of that. This way, you are finding the marginal median of the circle points and taking its angle as a result.
var scalingFactor = 2 * Math.PI / period;
var sines = new List<double>();
var cosines = new List<double>();
foreach (var value in inputs)
{
var radians = value * scalingFactor;
sines.Add(Math.Sin(radians));
cosines.Add(Math.Cos(radians));
}
var circularMedian = Math.Atan2(Median(sines), Median(cosines)) / scalingFactor;
if (circularMedian >= 0)
return circularMedian;
else
return circularMedian + period;
This approach is O(n), robust to outliers and very simple to implement. It may suit your purposes well enough, but it has a problem: rotating the input points will give you different results. Depending on the distribution of your input data, it may or may not be a problem.
Circular arc median
To understand this other approach, you need to stop thinking of means and medians in terms of "this is how it's calculated", but in terms of what the resulting values actually represent.
For non-cyclic data, you get the mean by summing up all the values and dividing by the number of elements. What this number represents, though, is the value with the minimal sum of all squared distances to data elements. (I hear statisticians call this value the L2 estimate of location, but a statistician should probably confirm or deny this.)
Likewise for median. You get it by finding the data element that would end up in the middle if all data were sorted (ideally, using an O(n) selection algorithm, like nth_element in C++). What this number is, though, is a value that has the minimal sum of all absolute (non-squared!) distances to data elements. (Supposedly, this value is called an L1 estimate of location.)
Sorting circular data doesn't help you find a middle, so the usual way of thinking about medians doesn't work, but you can still find this point that minimizes the sum of absolute distances from all data points. Here's the algorithm that I came up with, that runs in O(n) time assuming the input data is normalized to >= 0 and < period, and then sorted. (If you need to do this sorting as part of your calculation, then the runtime is O(n log n).)
It works by going through all the data points and keeping track of the sum of distances. When you shift to the right data point by a distance D, the sum of distances to all the left points increases by D*LeftCount and the sum of all distances to all the right points decreases by D*RightCount. Then, if some of the left points are now actually the right points, because their left distance is larger than period/2, you subtract their previous distance and add the new, correct distance.
For comparing the current sum to the best sum, I added a bit of tolerance to guard against inexact floating point arithmetic.
There may be multiple or infinitely many points that satisfy the minimum distances condition. With non-circular medians with even number of values, the median can be any value between the two central values. It's usually taken to be the average of those two central values, so I took the similar approach with this median algorithm. I find all data points that minimize the distances and then just calculate the circular mean of those points.
// Requires a sorted list with values normalized to [0,period).
// Doing an initialization pass:
// * candidate is the lowest number
// * finding the index where the circle with this candidate starts
// * calculating the score for this candidate - the sum of absolute distances
// * counting the number of values to the left of the candidate
int i;
var candidate = list[0];
var distanceSum = 0.0;
for (i = 1; i < list.Count; ++i)
{
if (list[i] >= candidate + period / 2)
break;
distanceSum += list[i] - candidate;
}
var leftCount = list.Count - i;
var circleStart = i;
if (circleStart == list.Count)
circleStart = 0;
else
for (; i < list.Count; ++i)
distanceSum += candidate + period - list[i];
var previousCandidate = candidate;
var bestCandidates = new List<double> { candidate };
var bestDistanceSum = distanceSum;
var equalityTolerance = period * 1e-10;
for (i = 1; i < list.Count; ++i)
{
candidate = list[i];
// A formula for correcting the distance given the movement to the right.
// It doesn't take into account that some values may have wrapped to the other side of the circle.
++leftCount;
distanceSum += (2 * leftCount - list.Count) * (candidate - previousCandidate);
// Counting all the values that wrapped to the other side of the circle
// and correcting the sum of distances from the candidate.
if (i <= circleStart)
while (list[circleStart] < candidate + period / 2)
{
--leftCount;
distanceSum += 2 * (list[circleStart] - candidate) - period;
++circleStart;
if (circleStart == list.Count)
{
circleStart = 0;
break; // Letting the next loop continue.
}
}
if (i > circleStart)
while (list[circleStart] < candidate - period / 2)
{
--leftCount;
distanceSum += 2 * (list[circleStart] - candidate) + period;
++circleStart;
}
// Comparing current sum to the best one, using the given tolerance.
if (distanceSum <= bestDistanceSum + equalityTolerance)
{
if (distanceSum >= bestDistanceSum - equalityTolerance)
{
// The numbers are close, so using their average as the next best.
bestDistanceSum = (bestCandidates.Count * bestDistanceSum + distanceSum) / (bestCandidates.Count + 1);
}
else
{
// The new number is significantly better, clearing.
bestDistanceSum = distanceSum;
bestCandidates.Clear();
}
bestCandidates.Add(candidate);
}
previousCandidate = candidate;
}
if (bestCandidates.Count == 1)
return bestCandidates[0];
else
return CircularMean(bestCandidates, period);
Geometric circular median
There is an inconsistency in the previous algorithm, in the way the median is defined in relation to the circular mean. The circular mean minimizes the sum of squared euclidian distances between points on a circle. In other words, it looks at the straight lines connecting points on a circle, cutting through the circle.
The arc median, as I calculate it above, looks at the arc distances: how far the points are to each other by moving on the perimeter of the circle, not by taking a straight line between them.
I have thought about how to address this issue, if it bothers you, but I haven't really done any experiments so I can't claim the following method works. In short, I believe you could use a modification of the Iteratively reweighted least squares algorithm (IRLS), which is what is usually used to calculate geometric medians.
The idea is to pick a starting value (for instance, the circular mean or the arc median presented above), and calculate the euclidean distance to each point: Di = sqrt(dxi^2 + dyi^2). Circular mean will minimize the squares of those distances, so the weights of each point should cancel out the square and reset to just D: Wi = Di / Di^2, which is just Wi = 1 / Di.
With these weights, calculate the weighted circular mean (same as the circular mean, but multiply each sine and cosine by the weight of that point before summing them up) and repeat the process. Repeat until enough iterations have passed or until the result stops changing much.
The problem with this algorithm is that it has a division by zero if the current solution falls exactly on a data point. Even if the distance isn't exactly zero, the solution will stop moving if you hit close enough to the point because the weight will become enormous compared to all the other ones. This can be fixed by adding a small fixed offset to the distance before dividing by it. This will make the solution suboptimal, but at least it won't stop on a wrong point.
It will still take some number of iterations to dig itself out of that wrong point unless the offset is relatively large, and the final solution is worse the bigger the offset is. So the best way would probably be to start with a fairly large offset and then progressively making it smaller for each next iteration.
Two properties of median allow inventing two distinct algorithms for median finding.
1) Median minimizes sum of absolute distance to all other elements -- O(n^2) algo:
for (i = 0; i < N; i++)
{
sum = 0;
for (j = 0; j < N; j++)
sum += abs(item[i] - item[j]) % 360;
if (sum < best_so_far) { best_so_far = sum; index = i; }
}
2) Median satisfies that half of items are less and half are greater
sort the items
locate the first set of items (i=0...I), satisfying either that
I <= N/2, OR item[I] > i + 180
if the condition for median is not satisfied, advance either i, or I.
requires O(N*log N) for sorting and O(N) for the next scan
Of course in cyclical data all items (and all items inbetween data points) can be a proper candidate for the median.
For definition and discussion of circular median see
N.I. Fisher's 'Statistical Analysis of Circular Data', Cambridge Univ. Press 1993
and the discussion surrounding equations 2.32 and 2.33. For multi-modal or isotropic data a unique median may not exist.
Find an axis that divides the data into 2 equal groups and choose the end of the axis at the smaller value of the angle. If the sample size is odd the median will be a data point, otherwise it will be the midpoint of 2 data points.
There are packages in other languages (e.g. R, MatLab) that would help provide test values for any function you write.
e.g.
https://www.rdocumentation.org/packages/circular/versions/0.4-93
See in particular median.circular and medianHL.circular
or
Berens, Philipp. ‘CircStat: A MATLAB Toolbox for Circular Statistics’. Journal of Statistical Software 31, no. 1 (23 September 2009): 1–21. https://doi.org/10.18637/jss.v031.i10.
and see circ_median
With your vector of angular datapoints (i.e. vector of numbers from 0 to 259), create two new vectors, I'll call them x and y. These two new vectors are the sine and cosine respectively of your angular datapoints.
That is, x[n] = cos(data[n]) and y[n] = sin(data[n]) where data is your angular data vector and n is however many datapoints there are.
Next, add up all the values in the x vector to get a single value, call it say sum_x and add up all the values in the y vector to get a another single value, call it sum_y.
Now you can do tangent-inverse (e.g. atan(sum_y/sum_x)) to get a new value. And this value is very meaningful. This value is basically telling you which direction your data is "pointing", i.e. where the majority of your data exists. NOTE: You must be careful of dividing by 0 (when sum_x=0) and when the indeterminate forms occurs (when both sum_x=0 and sum_y=0). The indeterminate form just means your data is evenly distributed, in which case the median is meaningless, and when sum_x=0 but sum_y!=0, then it is effectively atan(inf) or atan(-inf), both of which are known.
EDIT:
My previous answer needed some tweaking after this point.
From here, it is easy. Take the value you got in the previous step (atan(sum_y/sum_x)) and add 180 degrees to that value. This is your reference point of where your data starts and ends. From here, you can sort your angular data with this reference point as both the starting and ending point, and find the median of that data.
It is not possible to canonically extend the concept of median to circular data. For the sake of simplicity lets consider numbers in [0 10) and as an example the (already ordered) set { 1 3 5 7 8 }. Depending on how you rotate the array you get different values for the median:
1 3 5 7 8 -> 5
3 5 7 8 1 -> 7
5 7 8 1 3 -> 8
...etc...
and any is as good as the other.
I am not claiming that it is not possible to define a median on circular data. I am just claiming that the "normal" median cannot be extended to that case in a meaningful way without adding additional constraints or making an arbitrary choice.

Find the summation of forces between all possible pairs of points?

There are n points with each having two attributes:
1. Position (from axis)
2. Attraction value (integer)
Attraction force between two points A & B is given by:
Attraction_force(A, B) = (distance between them) * Max(Attraction_val_A, Attraction_val_B);
Find the summation of all the forces between all possible pairs of points?
I tried by calculating and adding forces between all the pairs
for(int i=0; i<n-1; i++) {
for(int j=i+1; j<n; j++) {
force += abs(P[i].pos - P[j].pos) * max(P[i].attraction_val, P[j].attraction_val);
}
}
Example:
Points P1 P2 P3
Points distance: 2 3 4
Attraction Val: 4 5 6
Force = abs(2 - 3) * max(4, 5) + abs(2 - 4) * max(4, 6) + abs(3 - 4) * max(5, 6) = 23
But this takes O(n^2) time, I can't think of a way to reduce it further!!
Scheme of a solution:
Sort all points by their attraction value and process them one-by-one, starting with the one with lowest attraction.
For each point you have to quickly calculate sum of distances to all previously added points. That can be done using any online Range Sum Query problem solution, like segment tree or BIT. Key idea is that all points to the left are really not different and sum of their coordinates is enough to calculate sum of distances to them.
For each newly added point you just multiply that sum of distances (obtained on step 2) by point's attraction value and add that to the answer.
Intuitive observations that I made in order to invent this solution:
We have two "bad" functions here (somewhat "discrete"): max and modulo (in distance).
We can get rid of max by sorting our points and processing them in a specific order.
We can get rid of modulo if we process points to the left and to the right separately.
After all these transformations, we have to calculate something which, after some simple algebraic transformations, converts to an online RSQ problem.
An algorithm of:
O(N2)
is optimal, because you need the actual distance between all possible pairs.

Improve minimum distance filter for pointset

I create a minimum distance filter for points.
The function takes a stream of points (x1,y1,x2,y2...) and removes the corresponding ones.
void minDistanceFilter(vector<float> &points, float distance = 0.0)
{
float p0x, p0y;
float dx, dy, dsq;
float mdsq = distance*distance; // minimum distance square
unsigned i, j, n = points.size();
for(i=0; i<n; ++i)
{
p0x = points[i];
p0y = points[i+1];
for(j=0; j<n; j+=2)
{
//if (i == j) continue; // discard itself (seems like it slows down the algorithm)
dx = p0x - points[j]; // delta x (p0x - p1x)
dy = p0y - points[j+1]; // delta y (p0y - p1y)
dsq = dx*dx + dy*dy; // distance square
if (dsq < mdsq)
{
auto del = points.begin() + j;
points.erase(del,del+3);
n = points.size(); // update n
j -= 2; // decrement j
}
}
}
}
The only problem that is very slow, due to it tests all points against all points (n^2).
How could it be improved?
kd-trees or range trees could be used for your problem. However, if you want to code from scratch and want something simpler, then you can use a hash table structure. For each point (a,b), hash using the key (round(a/d),round(b/d)) and store all the points that have the same key in a list. Then, for each key (m,n) in your hash table, compare all points in the list to the list of points that have key (m',n') for all 9 choices of (m',n') where m' = m + (-1 or 0 or 1) and n' = n + (-1 or 0 or 1). These are the only points that can be within distance d of your points that have key (m,n). The downside compared to a kd-tree or range tree is that for a given point, you are effectively searching within a square of side length 3*d for points that might have distance d or less, instead of searching within a square of side length 2*d which is what you would get if you used a kd-tree or range tree. But if you are coding from scratch, this is easier to code; also kd-trees and range trees are kinda overkill if you only have one universal distance d that you care about for all points.
Look up range tree, e.g. en.wikipedia.org/wiki/Range_tree . You can use this structure to store 2-dimensional points and very quickly find all the points that lie inside a query rectangle. Since you want to find points within a certain distance d of a point (a,b), your query rectangle will need to be [a-d,a+d]x[b-d,b+d] and then you test any points found inside the rectangle to make sure they are actually within distance d of (a,b). Range tree can be built in O(n log n) time and space, and range queries take O(log n + k) time where k is the number of points found in the rectangle. Seems optimal for your problem.