According to https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model, lat and lon information is stored at floatfield in geodjango.
I want to know if lat and lon were stored on floatfield, what's the number of significant digits used to represent coordinate?
Is there any way of changing the number of digit on floatfield in geodjango?
Can i use decimalfield in geodjango instead of floatfield? So how can i do that?
Because i'm building a service that require much distance calculation depend on coordinate, i need to reduce the hugeness of calculation.
Thanks for reading my poor english writing.
According to
https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model,
lat and lon information is stored at floatfield in geodjango.
Incorrect reference and inference all together. What you are looking at is a tutorial that shows how to import the World Borders Shapefile into django. Here lat and long are really irrelevant fields. What matters is mpoly. That's probably why lat and lng are stored as floats here. But they could have avoided confusion by not using lat,lng as separate fields.
The real geodjango way of storing points is using the PointField it allows you to store additional information about the location such as SRID. Exact internal representation varies from RDBMS to RDMBS.
If you need distance calculations the only way is to use a GeometryField or one of it's sub classes (of with PointField is one) so sub topics 1,2,3 in your question aren't really relevant.
Related
I am running a django application and I am using the PostGis extension for my db. I am trying to understand better what happens under the hood when I send coordinates, especially because I am working with different coordinate systems which translate to different SRIDs. My question is threefold:
Is django/postgis handling the transformation when creating a Point or Polygon in the DB.
Can I query it back using a different SRID
Is it advisable to use the default SRID=4326
Let's say I have a model like this (note I am setting the standard SRID=4326):
class MyModel(models.Model):
name = models.CharField(
max_length=120,
)
point = models.PointField(
srid=4326,
)
polygon = models.PolygonField(
srid=4326,
)
Now I am sending different coordinates and polygons with different SRIDS.
I am reading here in the django docs that:
Moreover, if the GEOSGeometry is in a different coordinate system (has a different SRID value) than that of the field, then it will be implicitly transformed into the SRID of the model’s field, using the spatial database’s transform procedure
So if I understand this correctly, this mean that when I am sending an API request like this:
data = {
"name": "name"
"point": "SRID=2345;POLYGON ((12.223242267 280.123144553))"
"polygon": "SRID=5432;POLYGON ((133.2345662 214.1429138285, 123.324244572 173.755820912250072))"
}
response = requests.request("post", url=url, data=data)
Both - the polygon and the point - will correctly be transformed into SRID=4326??
EDIT:
When I send a point with SRID=25832;POINT (11.061859 49.460983) I get 'SRID=4326;POINT (11.061859 49.460983)' from the DB. When I send a polygon with 'SRID=25832;POLYGON ((123.2796155732267 284.1831980485285, ' '127.9249715130572 273.7782091450072, 142.2351651215613 ' '280.3825718937042, 137.558146278483 290.279508688337, ' '123.2796155732267 284.1831980485285))' I get a polygon 'SRID=4326;POLYGON ((4.512360573651161 0.002563158966576373, ' '4.512402191765552 0.002469312460126783, 4.512530396754145 ' '0.002528880231016955, 4.512488494972807 0.00261814442892858, ' '4.512360573651161 0.002563158966576373))' from the DB
Can I query it back using a different SRID
Unfortunately I haven't found a way to query the same points back to their original SRID. Is this even possible?
And lastly I am working mostly with coordinates in Europe but I also might have to include sporadically coordinates from all over the world too. Is SRID=4326 a good standard to use?
Thanks a lot for all the help in advance. Really appreciated.
Transforming SRS of geometries is much more than just changing their SRID. So, if for some reason after a transformation the coordinates return with exactly the same values, there was most probably no transformation at all.
This example uses ST_Transform to transform a geometry from 25832 to 4326. See the results yourself:
WITH j (geom) AS (
VALUES('SRID=25832;POINT (11.061 49.463)'::geometry))
SELECT ST_AsEWKT(geom),ST_AsEWKT(ST_Transform(geom,4326)) FROM j;
st_asewkt | st_asewkt
---------------------------------+------------------------------------------------------
SRID=25832;POINT(11.061 49.463) | SRID=4326;POINT(4.511355210946569 0.000446125446657)
(1 Zeile)
The Polygon transformation in your question is btw correct.
Make sure that django is really storing the values you mentioned. Send a 25832 geometry and directly check the SRS in the database. If you're only checking using django, it might be that it is transforming the coordinates back again in the requests, which might explain you not seeing any difference.
To your question:
Is SRID=4326 a good standard to use?
WGS84 is the most used SRS worldwide, so I'd tend to say yes, but it all depends on your use case. If you're uncertain of which SRS to use, it might indicate that your use case does not impose any constraint to it. So, stick to WGS84 but keep in mind that you don't mix different SRS in your application. Btw: if you try to store geometries in multiple SRS in the same table, PostgreSQL will raise an exception ;)
Further reading: ST_AsEWKT, WGS84
First of all, I'm not big expert at GIS (I have created just a few small things in Django and GIS), but...
In this documentaion about GeoDjango: https://docs.djangoproject.com/en/3.1/ref/contrib/gis/tutorial/#automatic-spatial-transformations . According to it:
When doing spatial queries, GeoDjango automatically transforms geometries if they’re in a different coordinate system. ...
Try in console (./manage.py shell):
from <yourapp>.models import MyModel
obj1 = MyModel.objects.all().first()
print(obj1)
print(obj1.point)
print(dir(obj1.point))
print(obj1.point.srid)
--edit--
You can manually test converting between SRID similary to this page: https://gis.stackexchange.com/questions/94640/geodjango-transform-not-working
obj1.point.transform(<new-srid>)
I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.
I'm saving data to a PostgreSQL backend through Django. Many of the fields in my models are DecimalFields set to arbitrarily high max_digits and decimal_places, corresponding to numeric columns in the database backend. The data in each column have a precision (or number of decimal places) that is not known a priori, and each datum in a given column need not have the same precision.
For example, arguments to a model may look like:
{'dist': Decimal("94.3"), 'dist_e': Decimal("1.2")}
{'dist': Decimal("117"), 'dist_e': Decimal("4")}
where the keys are database column names.
Upon output, I need to preserve and redisplay those data with the precision with which they were read in. In other words, after the database is queried, the displayed data need to look exactly like the data in that were read in, with no additional or missing trailing 0's in the decimals. When queried, however, either in a django shell or in the admin interface, all of the DecimalField data come back with many trailing 0's.
I have seen similar questions answered for money values, where the precision (2 decimal places) is both known and the same for all data in a given column. However, how might one best preserve the exact precision represented by Decimal values in Django and numeric values in PostgreSQL when the precision is not the same and not known beforehand?
EDIT:
Possibly an additional useful piece of information: When viewing the table to which the data are saved in a Django dbshell, the many trailing 0's are also present. The python Decimal value is apparently converted to the maximum precision value specified in the models.py file upon being saved to the PostgreSQL backend.
If you need perfect parity forwards and backwards, you'll need to use a CharField. Any number-based database field is going to interact with your data muxing it in some way or another. Now, I know you mentioned not being able to know the digit length of the data points, and a CharField requires some length. You can either set it arbitrarily high (1000, 2000, etc) or I suppose you could use a TextField, instead.
However, with either approach, you're going to be wasting a lot database resources in most scenarios. I would suggest modifying your approach such that extra zeros at the end don't matter (for display purpose you could always chop them off), or such that the precision is not longer arbitrary.
Since I asked this question awhile ago and the answer remains the same, I'll share what I found should it be helpful to anyone in a similar position. Django doesn't have the ability to take advantage of the PostgreSQL Numerical column type with arbitrary precision. In order to preserve the display precision of data I upload to my database, and in order to be able to perform mathematical calculations on values obtained from database queries without first recasting strings into python Decimal types, I opted to add an extra precision column for every numerical column in the database.
The precision value is an integer indicating how many digits after the decimal point are required. The datum 4.350 is assigned a value of 3 in its corresponding precision column. Normally displayed integers (e.g. 2531) have a precision entry of 0. However, large integers reported in scientific notation are assigned a negative integer to preserve their display precision. The value 4.320E+33, for example, gets the precision entry -3. The database recognizes that all objects with negative precision values should be re-displayed in scientific notation.
This solution adds some complexity to the structure and code surrounding the database, but it has proven effective. It also allows me to accurately preserve precision through calculations like converting to/from log and linear values.
I store a PointField field named "coordinates" in a model.
Then, I consult nearest instances from a given one, in the command interpreter, and print its name and the distance in km.
[(p.name, p.distance.m) for p in Model_Name.objects.filter(
coordinates__distance_lte=(pnt, 1000000)).distance(pnt)]
The thing is, when the field "coordinates" is of kind geometry, it works well. But if I include the option "geography=True", to get better precision, it returns a much smaller value, even that I am indicating to print it in km as before.
How can I get correct geography calculations?
Thanks
Appending .distance() to the end of the query set would result in each object in the geoqueryset being annotated with a distance object. And you can use that distance object to get distances in different units.
Because the distance attribute is a Distance object, you can easily
express the value in the units of your choice. For example,
city.distance.mi is the distance value in miles and city.distance.km
is the distance value in kilometers
However in django 1.9 they moved the goal posts, while appending .distance() still works, now the recommended way is to do
from django.contrib.gis.db.models.functions import Distance
[ (p.name, p.distance.m) for p in Model_Name.objects.filter(
coordinates__distance_lte=(pnt, 1000000)
).annotate(distance=Distance('point', pnt))]
Finally, instead of using coordinates__distance_lte there is a much faster method available in postgis and mysql 5.7: dwithin
I am curious which one would be better fitting as a currency field ? I will do simple operations such as taking difference, the percentage between old and new prices. I plan to keep two digits after the zero (ie 10.50) and majority of the time if these digits are zero, I will be hiding these numbers and display it as "10"
ps: Currency is NOT dollar based :)
Always use DecimalField for money. Even simple operations (addition, subtraction) are not immune to float rounding issues:
>>> 10.50 - 0.20
10.300000000000001
>>> Decimal('10.50') - Decimal('0.20')
Decimal('10.30')
The answer to the question is correct, however some users will stumble on this question to find out the difference between the DecimalField and the FloatField. The float rounding issue Seth brings up is a problem for currency.
The Django Docs States
The FloatField class is sometimes mixed up with the DecimalField class. Although they both represent real numbers, they represent those numbers differently. FloatField uses Python’s float type internally, while DecimalField uses Python’s Decimal type.
Read more here.
Here are other differences between the two fields:
DecimalField:
DecimalFields must define a decimal_places and a max_digits attribute.
You get two free form validations included here from the above required attributes, e.g. if you set max_digits to 4, and you type in a decimal that is 4.00000 (5 digits), you will get this error: Ensure that there are no more than 4 digits in total.
You also get a similar form validation done for decimal places (which in most browsers will also validate on the front end using the step attribute on the input field. If you set decimal_places = 1 and type in 0.001 as the value you will get an error that the minimum value has to be 0.1.
Returns a decimal.Decimal, type is <class 'decimal.Decimal'>
Does not have the extra validation of DecimalField
With a Decimal type, rounding is also handled for you due to the required attributes that need to be set as described above. So from the shell, if you
In the database (postgresql), the DecimalField is saved as a numeric(max_digits, decimal_places) Type, and Storage is set as "main", from above example the Type is numeric(4,1)
More on DecimalField from the Django Docs.
FloatField:
Returns the built in float type, <type 'float'>
No smart rounding, and can actually result in rounding issues as described in Seths answer.
Does not have the extra form validation that you get from DecimalField
In the database (postgresql), the FloatField is saved as a "double precision" Type, and Storage is set as "plain"
More on FloatField from the Django Docs.
Applies to Both:
Both fields extend from the Field class and can accept blank, null, verbose_name, name, primary_key, max_length, unique, db_index, rel, default, editable, serialize, unique_for_date, unique_for_month, unique_for_year, choices, help_text, db_column, db_tablespace, auto_created, validators, error_messages attributes, as all Fields that extend from Field would have.
The default form widget for both fields is a TextInput.
I came across this question when looking for the difference between the two fields so I think this will help those in the same situation :)
UPDATE: To answer the question, I think you can get away with either to represent currency, although Decimal is a much better fit. There is a rounding issue when it counts to float's so you have to use round(value, 2) in order to keep your float representation rounded to two decimal places. Here is a quick example:
>>> round(1.13 * 50 + .01, 2)
56.51
You can still get in trouble with float and round. Like here we see it rounds down on a value of 5:
>>> round(5.685, 2)
5.68
But in this case, it will round up:
>>> round(2.995, 2)
3.0
It has all to do with how the float is stored in memory. See here.
I know this is super old, but I stumbled on it looking for something completely different, and I wanted to throw out there that in general it is inadvisable to use floating point numbers (float or decimal) for currency, as floating point math rounding will invariably lead to small errors in calculation that can add up to very large discrepancies over time.
Instead, use an integer field or a string at your preference. Multiply your currency to move the decimal place to the end and make a whole number when you store it, and then move that decimal place back where it belongs when you need to show it. This is basically how banks (and most currency libraries) handle storing data and will save you loads of trouble later on.
I learned this the hard way because it's not really a common topic; maybe this saves someone else from doing the same thing.
edit: The Satchmo Project is no longer active, so take a look at these alternatives for handling currency
Django Money
Oscar
The Django-based Satchmo Project has a CurrencyField and CurrencyWidget that are worth taking a look at.
Check out the satchmo_utils app directory for the source