Optimal query in GeoDjango & Postgis for locations within a distance in meters - django

I have geometries in a Postgis database with GeoDjango's default SRID, WGS84, and have found lookups directly in degrees to be much faster than in kilometres, because the database can skip the projections I would think.
Basically, Place.objects.filter(location__distance__lte=(point, D(km=10))) is several orders of magnitude slower than Place.objects.filter(location__dwithin=(point, 10)) as the first query produces a full scan of the table. But sometimes I need to lookup places with a distance threshold in kilometres.
Is there a somewhat precise way to convert the 10 km to degrees for the query?
Maybe another equivalent lookup with the same performance that I should be using instead?

You have several approaches to deal with your problem, here are two of them:
If you do not care much about precision you could use dwithin and use a naive meter to degree conversion degree(x meters) -> x / 40000000 * 360. You would have nearly exact results nearby the equator, but as you go north or south your distance would get shrink (shit we are living on a sphere). Imagine a region that is a circle in the beginning and shrinks to a infinite narrow elipse approaching one of the poles.
If you care about precision you can use:
max_distance = 10000 # distance in meter
buffer_width = max_distance / 40000000. * 360. / math.cos(point.y / 360. * math.pi)
buffered_point = point.buffer(buffer_width)
Place.objects.filter(
location__distance__lte=(point, D(m=max_distance)),
location__overlaps=buffered_point
)
The basic idea is to query for a all points that are within a circle around your point in degree. This part is very performant as the circle is in degreee and the geo index can be used. But the circle is sometimes a bit too big, so we left the filter in meters to filter out places that may be a bit farer away than the allowed max_distance.

A small update to the answer of frankV.
max_distance = 10000 # distance in meter
buffer_width = max_distance / 40000000. * 360. / math.cos(point.y / 360. * math.pi)
buffered_point = point.buffer(buffer_width)
Place.objects.filter(
location__distance__lte=(point, D(m=max_distance)),
location__intersects=buffered_point
)
I found the __overlaps doesn't work with postgresql and a point, but __intersects does.
To be sure it helps your query to speed-up, check the explain plan of the query (queryset.query to get the used query.)

Related

Precomputed distances for spectral clustering with scikit-learn

I'm struggling to make sense of the spectral clustering documentation here.
Specifically.
If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel:
np.exp(- X ** 2 / (2. * delta ** 2))
For my data, I have a complete distance matrix of size (n_samples, n_samples) where large entries represent dissimilar pairs, small values represent similar pairs and zero represents identical entries. (I.e. the only zeros are along the diagonal).
So all I need to do is build the SpectralClustering object with affinity = "precomputed" and then pass the transformed distance matrix to fit_predict.
I'm stuck on the suggested transformation equation. np.exp(- X ** 2 / (2. * delta ** 2)).
What is X here? The (n_samples, n_samples) distance matrix?
If so, what is delta. Is it just X.max()-X.min()?
Calling np.exp(- X ** 2 / (2. * (X.max()-X.min()) ** 2)) seems to do the right thing. I.e. big entries become relatively small, and small entries relatively big, with all the entries between 0 and 1. The diagonal is all 1's, which makes sense, since each point is most affine with itself.
But I'm worried. I think if the author had wanted me to use np.exp(- X ** 2 / (2. * (X.max()-X.min()) ** 2)) he would have told me to use just that, instead of throwing delta in there.
So I guess my question is just this. What's delta?
Yes, X in this case is the matrix of distances. delta is a scale parameter that you can tune as you wish. It controls the "tightness", so to speak, of the distance/similarity relation, in the sense that a small delta increases the relative dissimmilarity of faraway points.
Notice that delta is proportional to the inverse of the gamma parameter of the RBF kernel, mentioned earlier in the doc link you give: both are free parameters which can be used to tune the clustering results.

Desert fractal OpenGL

we're trying to generate a 3d world using a 2d perlin noise (with a recorsive/fractal technique). We have generated mountains and valleys quite fine but now we are having problems with desert and dunes because we only worked on persistence and octaves and we aren't able to make the classic shape of the dune. Has anybody already experienced that? Any solution, possibly still using perlin noise, or also other algorithms which allow you to do this?
You could give the Musgrave ridged multifractal a try. It gives nice ridged structures and you can use your existing noise algorithms for it.
The C reference implementation for it is here
Dunes are lobsided: .='\ cross section... you may want to use an initial shape of that kind
They are regular, like waves in the sea. not completely noise
they are elongated towards the wind
I didnt use the first condition, but i have made great dunes by multiplying 2 1d perin noises together, or even 2 sin/parabol functions, where they are both lined to one axis. i.e. Z, and they have a small low frequency Sin or noise wobbling them along X axis, so they aren't alined.
try this:
dunes = sin ( X + 1dperlin(Z) *.2 ) * sin ( X + 1dperlin(Z+432) *.2 );
otherwise to test it:
dunes = sin ( X + sin(Z) *.2 ) (plus or times or devided by) sin ( X + sin(Z+432) *.2 );
0.2 makes dunes 10 times longer than wide, and it's like when two straight water waves meet at almost the same angle, plus an uncertainty variable using noise for the angle.
Maybe turbulence is yet enough for what you need... Try to play with turbulence using the absolute value of your octaves return values instead of the normal values. You can also evaluate separately and combine your noise and your turbulence to mix both effects in some areas.

C++ - Smooth speedup and slowing of objects

I am dealing with some positions of objects in Cocos2dx but this question can apply to virtually every situation in which a smooth start and stop is necessary.
Here's what I am looking for:
Given a origin position at x = 0 and a final position of x = 8, I want to accelerate slowly and get further the further I am from the start and then have it slow down as it reaches the end. Is there a smoothing algorithm for this?
There are lots of algorithms for this. One idea is to set up a linear interpolation:
x(t) = t * x0 + (1.0 - t) * x1;
If you feed evenly spaced values of t from 0.0 to 1.0, you'll get a smooth, linear animation.
If you want slow start and slow end, you can use t = sin(theta)/2.0 + 1.0 for theta from -pi/2 to pi/2.
A second-order smooth path has constant acceleration during the first half, then constant deceleration during the second part.
This means you accelerate from x=0 to x=4. The formula is x(t)=a*t*t so your choice of acceleration a directly influences the time needed. If you set the deceleration to the same value, you'll arrive after twice the time in x=8. The formula for the second part is therefore x(t) = 16 - a*t*t. The halfway point in time is t=sqrt(4/a).

Proper calculation for the first element of an OpenGL projection Matrix?

Almost all the theoretical stuff I read about projection matrices have the first element being 2n/(r-l), but most of the open source implementations I've seen have it as 2n/((t-b)*a), -- which makes sense to me at first since (r-l) should be ((t-b)*a), but when I actually run the numbers, something feels off.
If we have a vertical field of view of 65 degrees, a near plane of .1, and an aspect ratio of 4:3, then I seem to get:
2n/(r-l) = .2 / (tan(65*(4/3)*.5) * .2) = 1.0599
but
2n((t-b)*a) = .2 / (tan(65*.5) * (4/3) * .2) = 1.1773
Why is there a difference between everything I read, and everything I see implemented? I didn't notice until I started implementing the same analytical inverse I see whose first element is (r-l)/2n, which isn't the inverse of these other implementations.
You can't multiply the aspect ratio into the angle. The tangens isn't a linear function. Having 65 degress vertical field of view does not mean that you're going to have 86,67 degrees horizontal FOV with 4:3 aspect, but ~80.69 degrees.

Designing a grid overlay based on longitudes and latitudes

I'm trying to figure out the best way to approach the following:
Say I have a flat representation of the earth. I would like to create a grid that overlays this with each square on the grid corresponding to about 3 square kilometers. Each square would have a unique region id. This grid would just be stored in a database table that would have a region id and then probably the long/lat coordinates of the four corners of the region, right? Any suggestions on how to generate this table easily? I know I would first need to find out the width and height of this "flattened earth" in kms, calculate the number of regions, and then somehow assign the long/lats to each intersection of vertical/horizontal line; however, this sounds like a lot of manual work.
Secondly, once I have that grid table created, I need to design a fxn that takes a long/lat pair and then determines which logical "region" it is in. I'm not sure how to go about this.
Any help would be appreciated.
Thanks.
Assume the Earth is a sphere with radius R = 6371 km.
Start at (lat, long) = (0, 0) deg. Around the equator, 3km corresponds to a change in longitude of
dlong = 3 / (2 * pi * R) * 360
= 0.0269796482 degrees
If we walk around the equator and put a marker every 3km, there will be about (2 * pi * R) / 3 = 13343.3912 of them. "About" because it's your decision how to handle the extra 0.3912.
From (0, 0), we walk North 3 km to (lat, long) (0.0269796482, 0). We will walk around the Earth again on a path that is locally parallel to the first path we walked. Because it is a little closer to the N Pole, the radius of this circle is a bit smaller than that of the first circle we walked. Let's use lower case r for this radius
r = R * cos(lat)
= 6371 * cos(0.0269796482)
= 6 368.68141 km
We calculate dlong again using the smaller radius,
dlong = 3 / (2 * pi * r) * 360
= 0.0269894704 deg
We put down the second set of flags. This time there are about (2 * pi * r) / 3 = 13 338.5352 of them. There were 13,343 before, but now there are 13,338. What's that? five less.
How do we draw a ribbon of squares when there are five less corners in the top line? In fact, as we walked around the Earth, we'd find that we started off with pretty good squares, but that the shape of the regions sheared out into pretty extreme parallelograms.
We need a different strategy that gives us the same number of corners above and below. If the lower boundary (SW-SE) is 3 km long, then the top should be a little shorter, to make a ribbon of trapeziums.
There are many ways to craft a compromise that approximates your ideal square grid. This wikipedia article on map projections that preserve a metric property, links to several dozen such strategies.
The specifics of your app may allow you to simplify things considerably, especially if you don't really need to map the entire globe.
Microsoft has been investing in spatial data types in their SQL Server 2008 offering. It could help you out here. Because it has data types to represent your flattened earth regions, operators to determine when a set of coordinates is inside a geometry, etc. Even if you choose not to use this, consider checking out the following links. The second one in particular has a lot of good background information on the problem and a discussion on some of the industry standard data formats for spatial data.
http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx
http://jasonfollas.com/blog/archive/2008/03/14/sql-server-2008-spatial-data-part-1.aspx
First, Paul is right. Unfortunately the earth is round which really complicates the heck out of this stuff.
I created a grid similar to this for a topographical mapping server many years ago. I just recoreded the coordinates of the upper left coder of each region. I also used UTM coordinates instead of lat/long. If you know that each region covers 3 square kilometers and since UTM is based on meters, it is straight forward to do a range query to discover the right region.
You do realize that because the earth is a sphere that "3 square km" is going to be a different number of degrees near the poles than near the equator, right? And that at the top and bottom of the map your grid squares will actually represent pie-shaped parts of the world, right?
I've done something similar with my database - I've broken it up into quad cells. So what I did was divide the earth into four quarters (-180,-90)-(0,0), (-180,0)-(0,90) and so on. As I added point entities to my database, if the "cell" got more than X entries, I split the cell into 4. That means that in areas of the world with lots of point entities, I have a lot of quad cells, but in other parts of the world I have very few.
My database for the quad tree looks like:
\d areaids;
Table "public.areaids"
Column | Type | Modifiers
--------------+-----------------------------+-----------
areaid | integer | not null
supercededon | timestamp without time zone |
supercedes | integer |
numpoints | integer | not null
rectangle | geometry |
Indexes:
"areaids_pk" PRIMARY KEY, btree (areaid)
"areaids_rect_idx" gist (rectangle)
Check constraints:
"enforce_dims_rectangle" CHECK (ndims(rectangle) = 2)
"enforce_geotype_rectangle" CHECK (geometrytype(rectangle) = 'POLYGON'::text OR rectangle IS NULL)
"enforce_srid_rectangle" CHECK (srid(rectangle) = 4326)
I'm using PostGIS to help find points in a cell. If I look at a cell, I can tell if it's been split because supercededon is not null. I can find its children by looking for ones that have supercedes equal to its id. And I can dig down from top to bottom until I find the ones that cover the area I'm concerned about by looking for ones with supercedeson null and whose rectangle overlaps my area of interest (using the PostGIS '&' operator).
There's no way you'll be able to do this with rectangular cells, but I've just finished an R package dggridR which would make this easy to do using a grid of hexagonal cells. However, the 3km cell requirement might yield so many cells as to overload your machine.
You can use R to generate the grid:
install.packages('devtools')
install.packages('rgdal')
library(devtools)
devools.install_github('r-barnes/dggridR')
library(dggridR)
library(rgdal)
#Construct a discrete global grid (geodesic) with cells of ~3 km^2
dggs <- dgconstruct(area=100000, metric=FALSE, resround='nearest')
#Get a hexagonal grid for the whole earth based on this dggs
grid <- dgearthgrid(dggs,frame=FALSE)
#Save the grid
writeOGR(grid, "grid_3km_cells.kml", "cells", "KML")
The KML file then contains the ids and edge vertex coordinates of every cell.
The grid looks a little like this:
My package is based on Kevin Sahr's DGGRID which can generate this same grid to KML directly, though you'll need to figure out how to compile it yourself.