Why is Django DecimalField attribute `max_digits` non-optional? - django

CommandError: System check identified some issues:
ERRORS:
(fields.E132) DecimalFields must define a 'max_digits' attribute.
Is there a technical reason why max_digits is a required attribute for Django's model field DecimalField?
The docs say it used Python's [decimal][1] module, but python object type doesn't seem to be bothered at all by anything to do with the absolute number of digits of the decimal object.
Maybe there's an opaque ORM reason?
FloatFields (against which DecimalFields are compared) don't require you to predetermine the number of digits so why decimals?
I know this small additional attribute shouldn't bother me but for some reason it's seemed unnecessary to me ever since the first time I used this field type.

It's required by the database (not the ORM). SQL decimal columns are declared with both precision (total number of significant digits) and scale (number of digits to the right of the decimal point).
See for instance the formal grammar quoted at this answer: https://stackoverflow.com/a/759606/2337736

Related

ColdFusion 9 set decimal type for cfqueryparam?

I have few form fields where user can enter whole numbers, decimal numbers and both types can be positive or negative. In other words they can enter something like this:
1 or 0.9 or 5.6745 or -10 or -0.9 or -10.5435
I'm wondering what I should use in my cfqueryparam on cfsqltype? I tried decimal but looks like that is not supported in ColdFusion 9. Is there any other option or I should use varchar?
Use the option that matches the column in your database. By selecting the correct value, Coldfusion will try to validate and format the value before it is sent off to the database driver.
With fix precision types cf_sql_numeric and cf_sql_decimal you need to also specify the scale (number of decimal places) your column accepts. By default the scale is 0, and will only store an integer number. Once again check your database for the scale.
For non precision types such as cf_sql_float no scale should be specified and the full value is sent off to the database.
Full list of data types can be found in the Adobes docs.

How can i use decimal field in geodjango?

According to https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model, lat and lon information is stored at floatfield in geodjango.
I want to know if lat and lon were stored on floatfield, what's the number of significant digits used to represent coordinate?
Is there any way of changing the number of digit on floatfield in geodjango?
Can i use decimalfield in geodjango instead of floatfield? So how can i do that?
Because i'm building a service that require much distance calculation depend on coordinate, i need to reduce the hugeness of calculation.
Thanks for reading my poor english writing.
According to
https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#defining-a-geographic-model,
lat and lon information is stored at floatfield in geodjango.
Incorrect reference and inference all together. What you are looking at is a tutorial that shows how to import the World Borders Shapefile into django. Here lat and long are really irrelevant fields. What matters is mpoly. That's probably why lat and lng are stored as floats here. But they could have avoided confusion by not using lat,lng as separate fields.
The real geodjango way of storing points is using the PointField it allows you to store additional information about the location such as SRID. Exact internal representation varies from RDBMS to RDMBS.
If you need distance calculations the only way is to use a GeometryField or one of it's sub classes (of with PointField is one) so sub topics 1,2,3 in your question aren't really relevant.

RapidMiner: Can I use a wildcard as an attribute value for training a decision tree model?

I am working on a fairly simple process in RapidMiner 5.3.013, which reads a CSV file and uses it as a training set to train the decision tree classifier. The result of the process is the model. A second CSV is read and used as the unlabeled set. The model (calculated earlier) is applied to the unlabeled test set, in an effort to label it properly.
Each line of the CSVs contains a few attributes, for example:
15, 0, 1555, abc*15, label1
but some lines of the training set may be like this:
15, 0, *, abc*15, label2
This is done because the third value may take various values, so the creator of the training set used a star as a wildcard in the place of the value.
What I would like to do is let the decision tree know that the star there means "match anything", so that it does not literally only match a star.
Notes:
the star in the 4th field (abc*15) should be matched literally and not as a wildcard.
if the 3rd field always contained stars, I could just not include it in the attributes, but that's not the case. Sometimes the 3rd field contains integer values, which should be matched literally.
I tried leaving the field blank, but it doesn't work
So, is there a way to use regular expressions, or at least a simple wildcard while training the classifier or using the model?
A different way to put it is: Can I instruct the classifier to not use some of the attributes in some of the entries (lines in the CSV)?
Thanks!
I would process the data so the missing value is valid in its own right and I would discretize the valid numbers to be in ranges.
In more detail, what I meant by missing is the situation where the value of an attribute is something like *. I would simply allow this to be one valid value that the attribute takes. For all the other values of this attribute, these are numerical so they need to be converted to a nominal value to be compatible with the now valid *.
It's fairly fiddly to do this and I haven't tried this but I would start with the operator Declare Missing Value to detect the * and make them missing. From there, I would use the operator Discretize by Binning to convert numbers into nominal values. Finally, I would use Replace Missing Values to change the missing values to a nominal value like Missing. You might ask why bother with the first Declare Missing step above? The reason is that it will allow the Discretizing operation to work because it will be working on numbers alone given that non-numbers are marked as missing.
The resulting example set then be passed to a model in the normal way. Obviously, the model has to be able to cope with nominal attributes (Decision trees does).
It occurred to me that some modelling operators are more tolerant of missing data. I think k-nearest-neighbours may be one. In this case, you could simply mark the missing ones as above and not bother with the discretizing step.
The whole area of missing data does need care because it's important to understand the source of missingness. If missing data is correlated with other attributes or with the label itself, handling it inappropriately can skew results.

Preserving output precision with Django DecimalField and PostgreSql Numeric field

I'm saving data to a PostgreSQL backend through Django. Many of the fields in my models are DecimalFields set to arbitrarily high max_digits and decimal_places, corresponding to numeric columns in the database backend. The data in each column have a precision (or number of decimal places) that is not known a priori, and each datum in a given column need not have the same precision.
For example, arguments to a model may look like:
{'dist': Decimal("94.3"), 'dist_e': Decimal("1.2")}
{'dist': Decimal("117"), 'dist_e': Decimal("4")}
where the keys are database column names.
Upon output, I need to preserve and redisplay those data with the precision with which they were read in. In other words, after the database is queried, the displayed data need to look exactly like the data in that were read in, with no additional or missing trailing 0's in the decimals. When queried, however, either in a django shell or in the admin interface, all of the DecimalField data come back with many trailing 0's.
I have seen similar questions answered for money values, where the precision (2 decimal places) is both known and the same for all data in a given column. However, how might one best preserve the exact precision represented by Decimal values in Django and numeric values in PostgreSQL when the precision is not the same and not known beforehand?
EDIT:
Possibly an additional useful piece of information: When viewing the table to which the data are saved in a Django dbshell, the many trailing 0's are also present. The python Decimal value is apparently converted to the maximum precision value specified in the models.py file upon being saved to the PostgreSQL backend.
If you need perfect parity forwards and backwards, you'll need to use a CharField. Any number-based database field is going to interact with your data muxing it in some way or another. Now, I know you mentioned not being able to know the digit length of the data points, and a CharField requires some length. You can either set it arbitrarily high (1000, 2000, etc) or I suppose you could use a TextField, instead.
However, with either approach, you're going to be wasting a lot database resources in most scenarios. I would suggest modifying your approach such that extra zeros at the end don't matter (for display purpose you could always chop them off), or such that the precision is not longer arbitrary.
Since I asked this question awhile ago and the answer remains the same, I'll share what I found should it be helpful to anyone in a similar position. Django doesn't have the ability to take advantage of the PostgreSQL Numerical column type with arbitrary precision. In order to preserve the display precision of data I upload to my database, and in order to be able to perform mathematical calculations on values obtained from database queries without first recasting strings into python Decimal types, I opted to add an extra precision column for every numerical column in the database.
The precision value is an integer indicating how many digits after the decimal point are required. The datum 4.350 is assigned a value of 3 in its corresponding precision column. Normally displayed integers (e.g. 2531) have a precision entry of 0. However, large integers reported in scientific notation are assigned a negative integer to preserve their display precision. The value 4.320E+33, for example, gets the precision entry -3. The database recognizes that all objects with negative precision values should be re-displayed in scientific notation.
This solution adds some complexity to the structure and code surrounding the database, but it has proven effective. It also allows me to accurately preserve precision through calculations like converting to/from log and linear values.

Django: FloatField or DecimalField for Currency?

I am curious which one would be better fitting as a currency field ? I will do simple operations such as taking difference, the percentage between old and new prices. I plan to keep two digits after the zero (ie 10.50) and majority of the time if these digits are zero, I will be hiding these numbers and display it as "10"
ps: Currency is NOT dollar based :)
Always use DecimalField for money. Even simple operations (addition, subtraction) are not immune to float rounding issues:
>>> 10.50 - 0.20
10.300000000000001
>>> Decimal('10.50') - Decimal('0.20')
Decimal('10.30')
The answer to the question is correct, however some users will stumble on this question to find out the difference between the DecimalField and the FloatField. The float rounding issue Seth brings up is a problem for currency.
The Django Docs States
The FloatField class is sometimes mixed up with the DecimalField class. Although they both represent real numbers, they represent those numbers differently. FloatField uses Python’s float type internally, while DecimalField uses Python’s Decimal type.
Read more here.
Here are other differences between the two fields:
DecimalField:
DecimalFields must define a decimal_places and a max_digits attribute.
You get two free form validations included here from the above required attributes, e.g. if you set max_digits to 4, and you type in a decimal that is 4.00000 (5 digits), you will get this error: Ensure that there are no more than 4 digits in total.
You also get a similar form validation done for decimal places (which in most browsers will also validate on the front end using the step attribute on the input field. If you set decimal_places = 1 and type in 0.001 as the value you will get an error that the minimum value has to be 0.1.
Returns a decimal.Decimal, type is <class 'decimal.Decimal'>
Does not have the extra validation of DecimalField
With a Decimal type, rounding is also handled for you due to the required attributes that need to be set as described above. So from the shell, if you
In the database (postgresql), the DecimalField is saved as a numeric(max_digits, decimal_places) Type, and Storage is set as "main", from above example the Type is numeric(4,1)
More on DecimalField from the Django Docs.
FloatField:
Returns the built in float type, <type 'float'>
No smart rounding, and can actually result in rounding issues as described in Seths answer.
Does not have the extra form validation that you get from DecimalField
In the database (postgresql), the FloatField is saved as a "double precision" Type, and Storage is set as "plain"
More on FloatField from the Django Docs.
Applies to Both:
Both fields extend from the Field class and can accept blank, null, verbose_name, name, primary_key, max_length, unique, db_index, rel, default, editable, serialize, unique_for_date, unique_for_month, unique_for_year, choices, help_text, db_column, db_tablespace, auto_created, validators, error_messages attributes, as all Fields that extend from Field would have.
The default form widget for both fields is a TextInput.
I came across this question when looking for the difference between the two fields so I think this will help those in the same situation :)
UPDATE: To answer the question, I think you can get away with either to represent currency, although Decimal is a much better fit. There is a rounding issue when it counts to float's so you have to use round(value, 2) in order to keep your float representation rounded to two decimal places. Here is a quick example:
>>> round(1.13 * 50 + .01, 2)
56.51
You can still get in trouble with float and round. Like here we see it rounds down on a value of 5:
>>> round(5.685, 2)
5.68
But in this case, it will round up:
>>> round(2.995, 2)
3.0
It has all to do with how the float is stored in memory. See here.
I know this is super old, but I stumbled on it looking for something completely different, and I wanted to throw out there that in general it is inadvisable to use floating point numbers (float or decimal) for currency, as floating point math rounding will invariably lead to small errors in calculation that can add up to very large discrepancies over time.
Instead, use an integer field or a string at your preference. Multiply your currency to move the decimal place to the end and make a whole number when you store it, and then move that decimal place back where it belongs when you need to show it. This is basically how banks (and most currency libraries) handle storing data and will save you loads of trouble later on.
I learned this the hard way because it's not really a common topic; maybe this saves someone else from doing the same thing.
edit: The Satchmo Project is no longer active, so take a look at these alternatives for handling currency
Django Money
Oscar
The Django-based Satchmo Project has a CurrencyField and CurrencyWidget that are worth taking a look at.
Check out the satchmo_utils app directory for the source