Geography mode distancies in GeoDjango - PostGIS 1.5 - django

I store a PointField field named "coordinates" in a model.
Then, I consult nearest instances from a given one, in the command interpreter, and print its name and the distance in km.
[(p.name, p.distance.m) for p in Model_Name.objects.filter(
coordinates__distance_lte=(pnt, 1000000)).distance(pnt)]
The thing is, when the field "coordinates" is of kind geometry, it works well. But if I include the option "geography=True", to get better precision, it returns a much smaller value, even that I am indicating to print it in km as before.
How can I get correct geography calculations?
Thanks

Appending .distance() to the end of the query set would result in each object in the geoqueryset being annotated with a distance object. And you can use that distance object to get distances in different units.
Because the distance attribute is a Distance object, you can easily
express the value in the units of your choice. For example,
city.distance.mi is the distance value in miles and city.distance.km
is the distance value in kilometers
However in django 1.9 they moved the goal posts, while appending .distance() still works, now the recommended way is to do
from django.contrib.gis.db.models.functions import Distance
[ (p.name, p.distance.m) for p in Model_Name.objects.filter(
coordinates__distance_lte=(pnt, 1000000)
).annotate(distance=Distance('point', pnt))]
Finally, instead of using coordinates__distance_lte there is a much faster method available in postgis and mysql 5.7: dwithin

Related

How is postgis treating coordinates sent with different SRID

I am running a django application and I am using the PostGis extension for my db. I am trying to understand better what happens under the hood when I send coordinates, especially because I am working with different coordinate systems which translate to different SRIDs. My question is threefold:
Is django/postgis handling the transformation when creating a Point or Polygon in the DB.
Can I query it back using a different SRID
Is it advisable to use the default SRID=4326
Let's say I have a model like this (note I am setting the standard SRID=4326):
class MyModel(models.Model):
name = models.CharField(
max_length=120,
)
point = models.PointField(
srid=4326,
)
polygon = models.PolygonField(
srid=4326,
)
Now I am sending different coordinates and polygons with different SRIDS.
I am reading here in the django docs that:
Moreover, if the GEOSGeometry is in a different coordinate system (has a different SRID value) than that of the field, then it will be implicitly transformed into the SRID of the model’s field, using the spatial database’s transform procedure
So if I understand this correctly, this mean that when I am sending an API request like this:
data = {
"name": "name"
"point": "SRID=2345;POLYGON ((12.223242267 280.123144553))"
"polygon": "SRID=5432;POLYGON ((133.2345662 214.1429138285, 123.324244572 173.755820912250072))"
}
response = requests.request("post", url=url, data=data)
Both - the polygon and the point - will correctly be transformed into SRID=4326??
EDIT:
When I send a point with SRID=25832;POINT (11.061859 49.460983) I get 'SRID=4326;POINT (11.061859 49.460983)' from the DB. When I send a polygon with 'SRID=25832;POLYGON ((123.2796155732267 284.1831980485285, ' '127.9249715130572 273.7782091450072, 142.2351651215613 ' '280.3825718937042, 137.558146278483 290.279508688337, ' '123.2796155732267 284.1831980485285))' I get a polygon 'SRID=4326;POLYGON ((4.512360573651161 0.002563158966576373, ' '4.512402191765552 0.002469312460126783, 4.512530396754145 ' '0.002528880231016955, 4.512488494972807 0.00261814442892858, ' '4.512360573651161 0.002563158966576373))' from the DB
Can I query it back using a different SRID
Unfortunately I haven't found a way to query the same points back to their original SRID. Is this even possible?
And lastly I am working mostly with coordinates in Europe but I also might have to include sporadically coordinates from all over the world too. Is SRID=4326 a good standard to use?
Thanks a lot for all the help in advance. Really appreciated.
Transforming SRS of geometries is much more than just changing their SRID. So, if for some reason after a transformation the coordinates return with exactly the same values, there was most probably no transformation at all.
This example uses ST_Transform to transform a geometry from 25832 to 4326. See the results yourself:
WITH j (geom) AS (
VALUES('SRID=25832;POINT (11.061 49.463)'::geometry))
SELECT ST_AsEWKT(geom),ST_AsEWKT(ST_Transform(geom,4326)) FROM j;
st_asewkt | st_asewkt
---------------------------------+------------------------------------------------------
SRID=25832;POINT(11.061 49.463) | SRID=4326;POINT(4.511355210946569 0.000446125446657)
(1 Zeile)
The Polygon transformation in your question is btw correct.
Make sure that django is really storing the values you mentioned. Send a 25832 geometry and directly check the SRS in the database. If you're only checking using django, it might be that it is transforming the coordinates back again in the requests, which might explain you not seeing any difference.
To your question:
Is SRID=4326 a good standard to use?
WGS84 is the most used SRS worldwide, so I'd tend to say yes, but it all depends on your use case. If you're uncertain of which SRS to use, it might indicate that your use case does not impose any constraint to it. So, stick to WGS84 but keep in mind that you don't mix different SRS in your application. Btw: if you try to store geometries in multiple SRS in the same table, PostgreSQL will raise an exception ;)
Further reading: ST_AsEWKT, WGS84
First of all, I'm not big expert at GIS (I have created just a few small things in Django and GIS), but...
In this documentaion about GeoDjango: https://docs.djangoproject.com/en/3.1/ref/contrib/gis/tutorial/#automatic-spatial-transformations . According to it:
When doing spatial queries, GeoDjango automatically transforms geometries if they’re in a different coordinate system. ...
Try in console (./manage.py shell):
from <yourapp>.models import MyModel
obj1 = MyModel.objects.all().first()
print(obj1)
print(obj1.point)
print(dir(obj1.point))
print(obj1.point.srid)
--edit--
You can manually test converting between SRID similary to this page: https://gis.stackexchange.com/questions/94640/geodjango-transform-not-working
obj1.point.transform(<new-srid>)

Applying word2vec to find all words above a similarity threshold

The command model.most_similar(positive=['france'], topn=100) gives the top 100 most similar words to "france". However, I would like to know if there is a method which will output the most similar words above a similarity threshold to a given word. Is there a method like the following?:
model.most_similar(positive=['france'], threshold=0.9)
No, you'd have to request a large number (or all, with topn=0) then apply the cutoff yourself.
What you request could theoretically be added as an option.
However, the cosine-similarity absolute magnitudes don't necessarily have a stable meaning, like "90% similar" across different model runs. Their distribution can vary based on model training parameters, such as the vector size, and they are most-often interpreted only in ranked-comparison to other pairwise values from the same model.
For example, the composition of the top-100 most-similar words for 'cold' may be very similar in models with different training parameters, but the range of absolute similarity values for the #1 to #100 words can be quite different. So if you were picking an absolute threshold, you'd likely want to vary the cutoff based on observing the model, or along with other model training metaparameters.
Well, let's say you can. Try the following code:
def find_most_similar(model, wrd, threshold=0.75):
res = [item for item in model.wv.most_similar(wrd, topn=len(model.wv.vocab)) if item[1] > threshold]
return res

How can i use decimal field in geodjango?

According to https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model, lat and lon information is stored at floatfield in geodjango.
I want to know if lat and lon were stored on floatfield, what's the number of significant digits used to represent coordinate?
Is there any way of changing the number of digit on floatfield in geodjango?
Can i use decimalfield in geodjango instead of floatfield? So how can i do that?
Because i'm building a service that require much distance calculation depend on coordinate, i need to reduce the hugeness of calculation.
Thanks for reading my poor english writing.
According to
https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model,
lat and lon information is stored at floatfield in geodjango.
Incorrect reference and inference all together. What you are looking at is a tutorial that shows how to import the World Borders Shapefile into django. Here lat and long are really irrelevant fields. What matters is mpoly. That's probably why lat and lng are stored as floats here. But they could have avoided confusion by not using lat,lng as separate fields.
The real geodjango way of storing points is using the PointField it allows you to store additional information about the location such as SRID. Exact internal representation varies from RDBMS to RDMBS.
If you need distance calculations the only way is to use a GeometryField or one of it's sub classes (of with PointField is one) so sub topics 1,2,3 in your question aren't really relevant.

Regression Tree Forest in Weka

I'm using Weka and would like to perform regression with random forests. Specifically, I have a dataset:
Feature1,Feature2,...,FeatureN,Class
1.0,X,...,1.4,Good
1.2,Y,...,1.5,Good
1.2,F,...,1.6,Bad
1.1,R,...,1.5,Great
0.9,J,...,1.1,Horrible
0.5,K,...,1.5,Terrific
.
.
.
Rather than learning to predict the most likely class, I want to learn the probability distribution over the classes for a given feature vector. My intuition is that using just the RandomForest model in Weka would not be appropriate, since it would be attempting to minimize its absolute error (maximum likelihood) rather than its squared error (conditional probability distribution). Is that intuition right? Is there a better model to be using if I want to perform regression rather than classification?
Edit: I'm actually thinking now that in fact it may not be a problem. Presumably, classifiers are learning the conditional probability P(Class | Feature1,...,FeatureN) and the resulting classification is just finding the c in Class that maximizes that probability distribution. Therefore, a RandomForest classifier should be able to give me the conditional probability distribution. I just had to think about it some more. If that's wrong, please correct me.
If you want to predict the probabilities for each class explicitly, you need different input data. That is, you would need to replace the value to predict. Instead of one data set with the class label, you would need n data sets (for n different labels) with aggregated data for each unique feature vector. Your data would look something like
Feature1,...,Good
1.0,...,0.5
0.3,...,1.0
and
Feature1,...,Bad
1.0,...,0.8
0.3,...,0.1
and so on. You would need to learn one model for each class and run them separately on any data to be classified. That is, for each label you learn a model to predict a number that is the probability of being in that class, given a feature vector.
If you don't need the probabilities to be predicted explicitly, have a look at the Bayesian classifiers in Weka, which make use of probabilities in the models that they learn.

Finding min distance between several points, or merging querysets with extra selects

I have a bunch of objects with location attributes (PointFields). I have two special locations, and I want to know which of those locations each object is closest to and how far that is. That is, I'd like to do something like:
q0 = q.distance(p0).extra(select={'dist_from': p0})
q1 = q.distance(p1).extra(select={'dist_from': p1})
qq = take_obj_with_min_distance(q0, q1)
(The actual query will do some stuff with bboverlaps and location__distance_lt, possibly involve more than two special locations, and possibly objects with multiple location attributes. Nevertheless, I think a solution to the above will handle all that other stuff.)
Afterwards, qq should have the same elements as q, but each element has a distance attribute and a dist_from attribute, where the distance attribute is the minimum of the distance from p0 and the distance from p1, and dist_from is the point with which it achieves that minimum.
Can I do this? Is it healthy for children and other living things?
I considered merging the queries and doing this stuff with a list, but of course you can't merge queries with extra select values (such as are introduced by distance queries). Also, I'll want to filter qq some more afterwards.
This page will give you the required code in a bunch of languages: http://www.codecodex.com/wiki/Calculate_Distance_Between_Two_Points_on_a_Globe
If you don't have too many object you might wish to do it in Python, but if you prefer querying the database it might be the best to prepare a procedure or function in SQL.