I'm working on getting all events within 10 miles of the user's location. My models look something like this:
class User(models.Model):
location = models.PointField()
...
class Event(models.Model):
location = models.PointField()
...
In my tests, when I check the distance between the user and an event, I get the value 11.5122663513:
from geopy.distance import vincenty
print vincenty(request.user.location, event.location).miles # 11.5122663513
Yet, when I query for all events within 10 miles of the user's location, that event is returned:
Event.objects.filter(location__distance_lte=(request.user.location, D(mi=10))).count() # 1
Only when I drop the radius to less than 4 miles does the filter take effect:
Event.objects.filter(location__distance_lte=(request.user.location, D(mi=3))).count() # 0
I'm following the docs' example almost exactly, so I don't think my query is the problem.
What could be causing this discrepancy?
This very much depends on what type of database you are using.
Because cartesian math is much faster than geospatial math, the query likely treats coordinates as if they are on a plane rather than on a sphere.
The docs explain it this way:
Most people are familiar with using latitude and longitude to
reference a location on the earth’s surface. However, latitude and
longitude are angles, not distances. In other words, while the
shortest path between two points on a flat surface is a straight line,
the shortest path between two points on a curved surface (such as the
earth) is an arc of a great circle. Thus, additional computation
is required to obtain distances in planar units (e.g., kilometers and
miles). Using a geographic coordinate system may introduce
complications for the developer later on. For example, Spatialite does
not have the capability to perform distance calculations between
geometries using geographic coordinate systems, e.g. constructing a
query to find all points within 5 miles of a county boundary stored as
WGS84.
Portions of the earth’s surface may projected onto a two-dimensional,
or Cartesian, plane. Projected coordinate systems are especially
convenient for region-specific applications, e.g., if you know that
your database will only cover geometries in North Kansas, then you may
consider using projection system specific to that region. Moreover,
projected coordinate systems are defined in Cartesian units (such as
meters or feet), easing distance calculations.
Furthermore, this may be influenced by your database choice. If you are using Postgres/PostGIS, it has the following note in the docs:
In PostGIS, ST_Distance_Sphere does not limit the geometry types
geographic distance queries are performed with. However, these
queries may take a long time, as great-circle distances must be
calculated on the fly for every row in the query. This is because the
spatial index on traditional geometry fields cannot be used.
For much better performance on WGS84 distance queries, consider using
geography columns in your database instead because they are able to
use their spatial index in distance queries. You can tell GeoDjango to
use a geography column by setting geography=True in your field
definition.
You can check this yourself by printing out the raw SQL:
qs = Event.objects.filter(location__distance_lte=(request.user.location, D(mi=10))
print qs.query
Depending on your database type, and the amount of data you plan to store, you have a couple options:
Filter the points a second time in python
Try setting geography=True
Set an explicit SRID
Take a point, buffer it out into a circle with the given radius and then find points within that circle using contains
Use a different database type
If you share the raw query it'll be easier to figure out what is happening.
Related
I'm familiar with Django, but new to GeoDjango and PostGIS.
I have a problem where I want to find the nearest MuliPolygon to a point. The point can be outside or within the MultiPolygon. Nearest means the nearest boundary point.
I know that I can calculate the distance between two points with from django.contrib.gis.db.models.functions import Distance - but I don't want to use the centroid because it is possible the border of a MultiPolygon is closer to one point than the centroid of another.
I have a model Land with a field surface_area which is a MultiPolygon. I have a point object created with from django.contrib.gis.geos import Point. This is the data I'm using to try and build a query.
Any help with best practices would be appreciated.
GeoDjango's distance lookups utilize the corresponding database's distance function.
Since you are using PostgreSQL with PostGIS the corresponding ST_Distance method:
For geometry types returns the minimum 2D Cartesian (planar) distance between two geometries, in projected units (spatial ref units).
Therefore you can use the distance lookups for your calculations.
If you want to implement it a bit differently (for example using the bounding boxes of the polygons) you can refer to this Q&A: How do I get the k nearest neighbors for geodjango?
From this question, I'd like to decide whether I should use GeoDjango, or roll my own with Python to filter Points within a certain radius of another Point.
There are two excellent answers that take different approaches to the question of how to perform such a calculation here: Django sort by distance
One of them uses GeoDjango to perform the distance calculation in PostGIS. I'm guessing that the compute would be done on the RDS instance?
The other uses a custom manager to implement the Great Circle distance formula. The compute would obviously be done on the EC2 instance.
I would imagine that the PostGIS implementation is more efficient because it's likely that people much smarter than I have optimized it. To what extent have they optimized it? Is there anything special about their implementation?
Assuming I am correct in assuming GeoDjango performs the distance compute using PostGIS on the RDS instance, I would imagine that RDS is not suited for heavy compute tasks, and may end up being slower or more expensive in the end. Are my assumptions correct?
What if I don't need a precise distance, where an octaggon or even a square would suffice? In the case of a square, it would be simply a matter of filtering Points with latitude and longitude within a certain range. Is GeoDjango/PostGIS able to perform estimates like this?
If I do need a precise distance, I could calculate the furthest bounds that can be reached with the given radius, and only perform precise distance calculations on Points within those bounds. Does GeoDjango/PostGIS do this?
I'll try to address you questions:
One of them uses GeoDjango to perform the distance calculation in
PostGIS. I'm guessing that the compute would be done on the RDS
instance?
If you are bringing two django models to memory, and doing the calculation using Django, such as
model_a = Foo.objects.get(id=1)
model_b = Bar.objects.get(id=1)
distance = model_a.geometry.distance(model_b.geometry)
This will be done in Python, using GEOS.
https://docs.djangoproject.com/en/1.9/ref/contrib/gis/geos/#django.contrib.gis.geos.GEOSGeometry.distance
There are distance lookups on Django, such as
foos = Foo.objects.filter(geometry__distance_lte=(Point(0,0,srid=4326), km1))
This calculation will be done by the backend (aka database).
The other uses a custom manager to implement the Great Circle distance
formula. The compute would obviously be done on the EC2 instance.
I would imagine that the PostGIS implementation is more efficient because it's likely that people much smarter than I have optimized it.
To what extent have they optimized it? Is there anything special about
their implementation?
Django has methods to use GCD in queries. This requires a transformation on the PostGIS, if you geometry field, to geography fields. Only EPSG:4326 is supported for now. If that's all you need, I bet the PostGIS implementation is good enough for almost all applications (if not all).
Assuming I am correct in assuming GeoDjango performs the distance compute using PostGIS on the RDS instance, I would imagine that RDS is
not suited for heavy compute tasks, and may end up being slower or
more expensive in the end. Are my assumptions correct?
I don't know much about amazon products, but without an estimate of size (number of rows, types of calculations (cross-product, for example), etc), it's hard to help further.
What if I don't need a precise distance, where an octaggon or even a square would suffice? In the case of a square, it would be simply a
matter of filtering Points with latitude and longitude within a
certain range. Is GeoDjango/PostGIS able to perform estimates like
this?
What kind of data do you have? There are several components in calculating distances and areas, mainly the spatial reference that you use (datum, ellipsoid, projection).
IF you need to do accurate or more accurate distance measurements between two distance sides of the globe, the geography side is more precise and it will yield good results. If you need to do that kind of measurements in a Cartesian plane, your data will yield bad results.
If your data is local, like a few sq km, consider using a more local spatial reference. WGS84 4326 is more suitable for global data. Local spatial references can give you precise results, but in much smaller extents.
If I do need a precise distance, I could calculate the furthest bounds that can be reached with the given radius, and only perform
precise distance calculations on Points within those bounds. Does
GeoDjango/PostGIS do this?
I think you are optimizing too early. I know your question is a bit old, but this is something that you should only care when it starts to hurt. PostGIS and Django have been grinding a lot of data for a long time for me in a govn. system that checks land registry parcels and does tons of queries to check several parameters. It's working for a few years without a hitch.
I am trying to find a way to parametrize the precision of my homography calculation. I would like to obtain a value that describes the precision of the homography calculation for a measurement taken at a certain position.
I currently have succesfully calculated the homography (with cv::findHomography) and I can use it to map a point on my camera image onto a 2D map (using cv::perspectiveTransform). Now I want to track these objects on my 2D map and to do this I want to take in account that objects that are in the back of my camera image have a less precise position on my 2D map than the objects that are all the way in the front.
I have looked at the following example on this website that mentions plane fitting but I don't really understand how to fill the matrices correctly using this method. The visualisation of the result does seem to fit my needs. Is there any way to do this with standard OpenCV functions?
EDIT:
Thanks Francesco for your recommendations. But, I think I am looking for something different than your answer. I am not looking to test the precision of the homography itself, but the relation between the density of measurements in one real camera view and the actual size on a map I create. I want to know that when I am 1 pixel off on my detection in the camera image, how many meters this will be on my map at this point.
I can of course calculate by taking some pixels around my measurement on my camera image and then use the homography to see how many meters on my map this represent every time I do a homography, but I don't want to calculate this every time. What I would like is to have a formula that tells me the relation between pixels in my image and pixels on my map so I can take this in account for my tracking on the map.
What you are looking for is called "predictive error bars" or "prediction uncertainty". You should definitely consult a good introductory book on estimation theory for details (e.g. this one). But briefly, the predictive uncertainty is the probability that...
A certain pixel p in image 1 will is the mapping H(p') of a pixel p' in image 2 under the homography H...
Given the uncertainty in H which is due to the errors in the matched pairs (q0, q0'), (q1, q1'), ..., that have been used to estimate H, ...
But assuming the model is correct, that is, that the true map between images 1 and 2 is, in fact, a homography (although the estimated parameters of the homography itself may be affected by errors).
In order to estimate this probability distribution you'll need a model for the errors in the measurements, and a model for how they propagate through the (homography) model.
I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?
By spatial query I mean finding objects in a given radius of a latitude and longitude.
SimpleDB doesn't currently offer any built-in spatial search operations but that doesn't mean it can't be done. There's several methods of implementing geospatial searches in non-geospatially aware databases such as SimpleDB and all of them center around the idea of using the database to retrieve a rough first selection based on a geospatial bounding box and then filtering the returned data in your application using more accurate algorithms such as the Haversine formula.
You could store the latitude and longitude as (zero-padded and normalized) numeric attributes and then perform a double range query (lat >= minLat and lat <= maxLat and lon >= minLat and lon <= maxLat) but since neither of theese predicates are selective (each predicate matches a lot of items) it's not ideal (see Tuning Queries).
A better way would be using GeoHashes.
Geohashes offer properties like arbitrary precision, similar prefixes
for nearby positions, and the possibility of gradually removing
characters from the end of the code to reduce its size (and gradually
lose precision).
As a practical example, the Geohash 6gkzwgjzn820 decodes to the
coordinates -25.382708 and -49.265506, while the Geohash 6gkzwgjz will
decode to -25.383 and -49.266, and if we take a similar position in
the same region, such as -25.427 and -49.315, we can see it being
encoded as 6gkzmg1w (note the similar prefix).
From http://geohash.org/site/tips.html
With your item positions as GeoHashes you could use the like operator to search for a bounding box (where GeoHash like '6gkzmg1w%') but since the like operator is expensive (Comparison Operators) a better way would be to denormalize the data by storing each GeoHash prefix level (how many depends on your required search precision) as a separate attribute (GeoHash6 GeoHash8 etc) and then use a simple equality predicate (where Geohash8 = '6gkzmg1w').
Now on to the downside of GeoHashes. Since you can't make any assumption of a GeoHash being centered within your search box you have to search all neighboring prefixes as well. The process is excellently described by geohash-js
Geohash also has the property that as the number of digits decreases
(from the right), accuracy degrades. This property can be used to do
bounding box searches, as points near to one another will share
similar Geohash prefixes.
However, because a given point may appear at the edge of a given
Geohash bounding box, it is necessary to generate a list of Geohash
values in order to perform a true proximity search around a point.
Because the Geohash algorithm uses a base-32 numbering system, it is
possible to derive the Geohash values surrounding any other given
Geohash value using a simple lookup table.
So, for example, 1600 Pennsylvania Avenue, Washington DC resolves to:
38.897, -77.036
Using the geohash algorithm, this latitude and longitude is converted
to: dqcjqcp84c6e
A simple bounding box around this point could be described by
truncating this geohash to: dqcjqc
However, 'dqcjqcp84c6e' is not centered inside 'dqcjqc', and searching
within 'dqcjqc' may miss some desired targets.
So instead, we can use the mathematical properties of the Geohash to
quickly calculate the neighbors of 'dqcjqc'; we find that they are:
'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'
This gives us a bounding box around 'dqcjqcp84c6e' roughly 2km x 1.5km
and allows for a database search on just 9 keys: SELECT * FROM table
WHERE LEFT(geohash,6) IN ('dqcjqc',
'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');
Translated to a SimpleDB query that'd be where GeoHash6 in('dqcjqc', 'dqcjqf', 'dqcjqb', 'dqcjr1', 'dqcjq9', 'dqcjqd', 'dqcjr4', 'dqcjr0', 'dqcjq8') and then you'll do your Haversine filtering on the results in order to only get the items that's within your search radius.
I'm going to leave this here because it might help you!
14 years ago we tried to do a geo lookup table of locations within a radius. There was obviously no geospatial indexes or anything like that.
There was literally only standard SQL and Oracle... anyway, we ended up converting all lat/lng into kilometers from a fixed plane field. Essentially what geospatial indexes do these days.
To explain what exactly it does, it turns the world into a flat surface and with a bit of SQL trickery you can even select by radius, you even get the distance from the two points you're selecting. Since it's also raw full integers the queries are blazing fast.
Here is a simple example in PHP and a very complex looking but pretty easy once you understand it SQL query:
https://gist.github.com/tobsn/899413
Looking to store a circle in a geodjango field so I can use the geodjango query __contains to find out if a point is in the circle (similar to what can be done with a PolygonField).
Currently have it stored as a Decimal radius and GeoDjango Point Field, but need a way to query a list of locations in the DB such that these varying circles (point field and radii) contain my search point (long/lat).
Hope it makes sense.
Technically speaking, PostGIS supports CurvePolygon and CircularString geometry types, which can be used to store curved geometries. For example, a 2-unit radius around x=10, y=10 that has been approximated by a 64-point buffered polygon is:
SELECT ST_AsText(ST_LineToCurve(ST_Buffer(ST_MakePoint(10, 10), 2, 16)));
st_astext
------------------------------------------------
CURVEPOLYGON(CIRCULARSTRING(12 10,8 10,12 10))
(1 row)
However, this approach is not typically done, as there is very limited support for this geometry type (i.e., ST_AsSVG, and others won't work). These geometry types will likely cause plenty of grief, and I'd recommend not doing this.
Typically, all geometries are stored as a well supported type: POINT, LINESTRING or POLYGON (with optional MULTI- prefix). With these types, use the ST_DWithin function (e.g., GeoDjango calls this __dwithin, see also this question) to query if another geometry is within a specified distance. For example, if you have a point location, you can see if other geometries are within a certain distance (i.e., radius) from the point.