Monthly Reports in Django - Enforce uniqueness for month? - django

This is a fairly quick question regarding schema design for a Django project.
Basically, we have a series of monthly reports from different departments that are aggregated into a single report with some pretty charts (we're probably going to use Google Visualization API for that, but I'm open to other suggestions, if you think something else integrates nicely with Django).
Each department is responsible for filing their own figures for their part of the report. We'll probably be using the Django admin for entering in those figures, since it doesn't have to be pretty, it's just to get some numbers in each month.
I'm assuming the better way here is to have an abstract Report model, inherit this for each department having a separate Model, with any specific fields overridden, and then have a DateField on each one as well.
Having a month as the parent object, and descending the reports from that - that's a silly approach, right?
Also, what's the best way of enforcing it so that they can only submit their figures once? I could have a separate month and year field, I guess and enforce a unique on that field, but I was hoping to use the inbuilt DateField, but what's the best way to enforce month and year uniqueness? Should I use the new model validation feature for that?
Cheers,
Victor

A Django model field can have an option unique_for_month and unique_for_year. The value of that option is a DateTime or Date field in the model.
E.g.
class Report(models.Model):
submit_date = models.DateField()
department_id = models.IntegerField(unqiue_for_month=date)

Related

How to censor specific fields based on condition using django QuerySet API

Using Django we have situation where we have a model Case which can be set to being a medical case or not (through a BooleanField).
Now, we also have a system to check if a certain employee (User subclass) is authorized to see sensitive data when a case is labeled as being medical (containing medical sensitive data).
I am able to annotate a new field to each instance, a BooleanField letting us know if the requesting employee is authorized to see medical data on the specific Case instance or not.
Ideally, I would like to have the database sensor out specific fields (field customer for example), when the requesting employee is not authorized to see medical data for that case. I imagine this can be done with an annotate method, and a combination of from django.db.models.Case and from django.db.models.When.
But, what we would also like is that the resulting QuerySet keeps the same field names on the different model instances. We don't want to change the field name of customer to another name.
We have actually come up with a solution, using .values first, and then the .annotate for each field we want to potentially censor out (see code below). This isn't ideal though, for multiple reasons. For one, we don't get back model instances, but dictionaries. Also, but this is another question, one of the fields that needs to be censored is a ManyToManyField, and using .values now returns a unique row for each instance referred to through the ManyToManyField (any solution for that?)
Also, ideally, this queryset would be the base queryset for all situations in which an employee tries to request Cases in our app. We want all our colleagues to use this base queryset so that we don't have to implement the same solution in multiple places, and prevent sensitive data from leaking.
So, I am wondering, can anyone recommend a solution for this situation?
Thanks in advance!
PS. We would like to have this done by the database since the amount of cases being fetched is potentially very high, and doing this in Python would probably require a lot of CPU power and thus kill performance.
from django.db.models import Case, When, BooleanField, IntegerField, F, Value, Q
OurModel.objects.annotate(
employee_medical_authorized=Case(
When(..., then=Value(True)),
default=Value(False),
output_field=BooleanField()
)).values(...).annotate(
customer=Case(
When(Q(employee_medical_authorized=Value(False)) & Q(medical=Value(True)),
then=Value(None)),
default=F('customer'),
output_field=IntegerField()
)
)

Derived variables in Django Models

I am building a Django app that has a central Projects model:
class Project(models.Model):
fundamental_attrib1=models.IntegerField()
fundamental_attrib2=models.DecimalField(max_digits=10, decimal_places=2)
derived_attrib1=models.DecimalField(null=True, max_digits=10, decimal_places=2)
derived_attrib1_start=models.DateField(null=True)
derived_attrib1_end=models.DateField(null=True)
derived_attrib2=models.DecimalField(null=True, max_digits=10, decimal_places=2)
derived_attrib2_start=models.DateField(null=True)
derived_attrib2_end=models.DateField(null=True)
derived_attrib3=models.IntegerField(null=True)
derived_attrib3_start=models.DateField(null=True)
derived_attrib3_end=models.DateField(null=True)
The goal is to allow users to instantiate new projects where they can only see (and only need to) the 'fundamental' variables in the form to create/update a Project. Once they have submitted the form, I want to be calculate all the optional parameters before saving the project to the database.
In addition, most of my derived variables come in groups of three as above (value, start date, end date). Is there a better way (that makes sense) to store them in the database? Naive string example:
{'derived_attrib1':[1000,date(2017,1,1),date(2017,2,2)]}
{'derived_attrib2':[2000,date(2017,2,1),date(2017,3,2)]}
{'derived_attrib3':[ 500,date(2017,3,1),date(2017,4,2)]}
My eventual 'end goal' is, for each group:
create numpy arrays (or bring into one DataFrame?) to interpolate days between the start and end dates
linearly distribute the value across the days
plot each group as a timeseries (probably with D3.js/Bokeh or similar)
If these come in groups and can be seen as an entity, consider using ArrayField. For other backends, one could use a json representation in a TextField (or one of the json fields that work in the same way). Field choice depends on what comes in from the calculation and what your processing is most comfy with.
The choice not to do this, would be if you frequently would filter querysets on attributes of these entities, as while that's possible, it's not as fast as querying straight fields. Relations will be impossible.
A totally different approach is to use OneToOne fields to models. This creates a ton of joins for your approach, so I'm not recommending it, but it has some advantages in terms of handling each derived entity: it's fields and calculation method are independent of the model that uses them.
I would say ForeignKey in a related model would be the best bet. Then you can query Project.objects.filter(derived__start__gt=start, derived__end__lt=something). prefetch_related would only require two queries to get any amount of data from these two tables. This allows the number of properties to be infinite and you can query them any way you want.
class Derived(models.Model):
project = models.ForeignKey(Project, related_name='derived')
value=models.DecimalField(null=True, max_digits=10, decimal_places=2)
start=models.DateField(null=True)
end=models.DateField(null=True)

Do Django models really need a single unique key field

Some of my models are only unique in a combination of keys. I don't want to use an auto-numbering id as the identifier as subsets of the data will be exported to other systems (such as spreadsheets), modified and then used to update the master database.
Here's an example:
class Statement(models.Model):
supplier = models.ForeignKey(Supplier)
total = models.DecimalField("statement total", max_digits=10, decimal_places=2)
statement_date = models.DateField("statement date")
....
class Invoice(models.Model):
supplier = models.ForeignKey(Supplier)
amount = models.DecimalField("invoice total", max_digits=10, decimal_places=2)
invoice_date = models.DateField("date of invoice")
statement = models.ForeignKey(Statement, blank=True, null=True)
....
Invoice records are only unique for a combination of supplier, amount and invoice_date
I'm wondering if I should create a slug for Invoice based on supplier, amount and invoice_date so that it is easy to identify the correct record.
An example of the problem of having multiple related fields to identify the right record is django-csvimport which assumes there is only one related field and will not discriminate on two when building the foreign key links.
Yet the slug seems a clumsy option and needs some kind of management to rebuild the slugs after adding records in bulk.
I'm thinking this must be a common problem and maybe there's a best practice design pattern out there somewhere.
I am using PostgreSQL in case anyone has a database solution. Although I'd prefer to avoid that if possible, I can see that it might be the way to build my slug if that's the way to go, perhaps with trigger functions. That just feels a bit like hidden functionality though, and may cause a headache for setting up on a different server.
UPDATE - after reading initial replies
My application requires that data may be exported, modified remotely, and merged back into the master database after review and approval. Hidden autonumber keys don't easily survive that consistently. The relation invoices[2417] is part of statements[265] is not persistent if the statement table was emptied and reloaded from a CSV.
If I use the numeric autonumber pk then any process that is updating the database would need to refresh the related key numbers or by using the multiple WITH clause.
If I create a slug that is based on my 3 keys but easy to reproduce then I can use it as the key - albeit clumsily. I'm thinking of a slug along the lines:
u'%s %s %s' % (self.supplier,
self.statement_date.strftime("%Y-%m-%d"),
self.total)
This seems quite clumsy and not very DRY as I expect I may have to recreate the slug elsewhere duplicating the algorithm (maybe in an Excel formula, or an Access query)
I thought there must be a better way I'm missing but it looks like yuvi's reply means there should be, and there will be, but not yet :-(
What you're talking about it a multi-column primary key, otherwise known as "composite" or "compound" keys. Support in django for composite keys today is still in the works, you can read about it here:
Currently Django models only support a single column in this set,
denying many designs where the natural primary key of a table is
multiple columns [...] Current state is that the issue is
accepted/assigned and being worked on [...]
The link also mentions a partial implementation which is django-compositekeys. It's only partial and will cause you trouble with navigating between relationships:
support for composite keys is missing in ForeignKey and
RelatedManager. As a consequence, it isn't possible to navigate
relationships from models that have a composite primary key.
So currently it isn't entirely supported, but will be in the future. Regarding your own project, you can make of that what you will, though my own suggestion is to stick with the fully supported default of a hidden auto-incremented field that you don't even need to think about (and use unique_together to enforce the uniqness of the described fields instead of making them your primary keys).
I hope this helps!
No.
Model needs to have one field that is primary_key = True. By default this is the (hidden) autofield which stores object Id. But you can set primary_key to True at any other field.
I've done this in cases, Where i'm creating django project upon tables which were previously created manually or through some other frameworks/systems.
In reality - you can use whatever means you can think of, for joining objects together in queries. As long as query returns bunch of data that can be associated with models you have - it does not really matter which field you are using for joins. Just keep in mind, that the solution you use should be as effective as possible.
Alan

How to retrieve values from Django ForeignKey -> ManyToMany fields?

I have a model (Realtor) with a ForeignKey field (BillingTier), which has a ManyToManyField (BillingPlan). For each logged in realtor, I want to check if they have a billing plan that offers automatic feedback on their listings. Here's what the models look like, briefly:
class Realtor(models.Model):
user = models.OneToOneField(User)
billing_tier = models.ForeignKey(BillingTier, blank=True, null=True, default=None)
class BillingTier(models.Model):
plans = models.ManyToManyField(BillingPlan)
class BillingPlan(models.Model):
automatic_feedback = models.BooleanField(default=False)
I have a permissions helper that checks the user permissions on each page load, and denies access to certain pages. I want to deny the feedback page if they don't have the automatic feedback feature in their billing plan. However, I'm not really sure the best way to get this information. Here's what I've researched and found so far, but it seems inefficient to be querying on each page load:
def isPermitted(user, url):
premium = [t[0] for t in user.realtor.billing_tier.plans.values_list('automatic_feedback') if t[0]]
I saw some solutions which involved using filter (ManyToMany field values from queryset), but I'm equally unsure of using the query for each page load. I would have to get the billing tier id from the realtor: bt_id = user.realtor.billing_tier.id and then query the model like so:
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct()
I think the second option reads nicer, but I think the first would perform better because I wouldn't have to import and query the BillingTier model.
Is there a better option, or are these two the best I can hope for? Also, which would be more efficient for every page load?
As per the OP's invitation, here's an answer.
The core question is how to define an efficient permission check based on a highly relational data model.
The first variant involves building a Python list from evaluating a Django query set. The suspicion must certainly be that it imposes unnecessary computations on the Python interpreter. Although it's not clear whether that's tolerable if at the same time it allows for a less complex database query (a tradeoff which is hard to assess), the underlying DB query is not exactly simple.
The second approach involves fetching additional 1:1 data through relational lookups and then checking if there is any record fulfilling access criteria in a different, 1:n relation.
Let's have a look at them.
bt_id = user.realtor.billing_tier.id: This is required to get the hook for the following 1:n query. It is indeed highly inefficient in itself. It can be optimized in two ways.
As per Django: Access Foreign Keys Directly, it can be written as bt_id = user.realtor.billing_tier_id because the id is of course present in billing_tier and needs not be found via a relational operation.
Assuming that the page in itself would only load a user object, Django can be told to fetch and cache relational data along with that via select_related. So if the page does not only fetch the user object but the required billing_tier_id as well, we have saved one additional DB hit.
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct() can be optimized using Django's exists because that will redurce efforts both in the database and regarding data traffic between the database and Python.
Maybe even Django's prefetch_related can be used to combine the 1:1 and 1:n queries into a single query, but it's much more difficult to judge whether that pays. Could be worth a try.
In any case, it's worth installing a gem called Django Debug Toolbar which will allow you to analyze how much time your implementation spends on database queries.

Seeking Good Approach to Persist Data Submitted Through Dynamic Django Forms

Summary:
Looking for a good way to save data to Django models for which the associated forms are generated dynamically.
Detail:
I've been puzzling over the best approach for creating dynamic Django forms backed by models. For example, I'd like to create an interface where a user can create an HTML form, customize the types of fields in that form dynamically (Number, String, Dropdown Box, Date, etc.), and then display that form to other users so those users can submit data which is saved to a database. I'm not sure how to make an efficient approach to persist the data.
www.formsite.com and www.mailchimp.com have some form-building tools that are nice examples of what I am trying to do. Jacob Kaplan-Moss has an excellent tutorial on how to create the forms dynamically, but the tutorial doesn't get into how to persist the data.
As a dummy example, one (perhaps bad?) approach might be to create some models like below, where there is a database table for the SurveyQuestions (storing the customizable names and datatypes of each field) and one for the SurveyQuestionResponses (each record storing an individual response for a SurveyQuestion on a particular Survey).
However, it seems like this approach might result in really complex and slow queries. For example, if a Survey has 10 questions and you would like to display 10 user responses to that survey, there would be queries to select all 10 SurveyQuestions and then for each survey responder, there would be a query to select each of the SurveyQuestionResponses. It seems like the number of queries needed could add up really fast!
class Survey(models.Model):
# some fields here.
pass
class SurveyQuestion(models.Model):
""" Defines the headings and field
types for a given Survey.
"""
survey = models.ForeignKey(Survey)
field_name = models.CharField(
max_length=255,
help_text='Enter the name for this field heading')
field_type = models.IntegerField(
choices=choices.FIELD_TYPES,
help_text='Enter the data type for this field')
display_order = models.IntegerField(default=0)
class SurveyQuestionResponse(models.Model):
survey_field = models.ForeignKey(SurveyQuestion)
response value = models.TextArea(blank=True, null=True)
Is there a better approach to persisting data based on dynamic forms? Should I be somehow converting a form respondent's response to some sort of pickled format and store it to a TextField (Instead of having 10 SurveyQuestionResponse records there would be one record with all the response values pickled together)? I'm not too familiar with NoSQL options, but would a NoSQL approach work best for this type of thing? Is there some sort of rendering or caching that would make sense to do?
I keep encountering situations where saving data from dynamic forms like this would be very useful. I am wondering what other people's approaches are. Any advice is much appreciated. Thanks for reading this admittedly long question.
Joe
For a relational database an Entity-attribute-value model(EAV) could be used to achieve a dynamic, or open schema. Relational databases are not really suited for this type of schema, and this generally results in very slow queries over time. NoSQL has its own set of issues but I think that it would be best suited to your requirements. If you decide to take this route you can take a look at MongoDB. I have not used it extensively, but it seems most similar to relational database than the other NoSQL database out there, and its python interface seems pretty similar to django's ORM. By the was I remember finding a nice EAV example for Django. Though I don't remember where at the moment.