Django ORM: dynamic columns from reference model in resultset

Django ORM: dynamic columns from reference model in resultset - django

Creating an app to track time off accrual. Users have days and days have types like "Vacation" or "Sick"
Models:
DayType
Name
UserDay
Date
DayType (fk to DayType)
Value (+ for accrual, - for day taken)
Note
Total
I'm trying to generate the following resultset expanding the daytypes across columns. Is this possible in the ORM, or do I have to build this in code?

I think you'd have an easier time by not putting the DayType in another model. Is there a specific reason you went that route?
If not, you should take a look at the choices attribute of Django's fields. Your code would look something like this:
class UserDay(models.Model):
DAY_TYPES = (
('vac', 'Vacation'),
('ill', 'Sick'),
)
day_type = models.CharField(max_length=3, choices=DAY_TYPES)
# Other fields here...
It seems like a somewhat cleaner solution since the types of days they have aren't likely to change very often. Plus, you can avoid a DB table and FK lookup by storing the values this way.

Related

Converting SQL to something that Django can use

I am working on converting some relatively complex SQL into something that Django can play with. I am trying not to just use the raw SQL, since I think playing with the standard Django toolkit will help me learn more about Django.
I have already managed to break up parts of the sql into chunks, and am tackling them piecemeal to make things a little easier.
Here is the SQL in question:
SELECT i.year, i.brand, i.desc, i.colour, i.size, i.mpn, i.url,
COALESCE(DATE_FORMAT(i_eta.eta, '%M %Y'),'Unknown')
as eta
FROM i
JOIN i_eta ON i_eta.mpn = i.mpn
WHERE category LIKE 'kids'
ORDER BY i.brand, i.desc, i.colour, FIELD(size, 'xxl','xl','l','ml','m','s','xs','xxs') DESC, size+0, size
Here is what I have (trying to convert line by line):
(grabbed automatically when performing filters)
(have to figure out django doc on coalesce for syntax)
db alias haven't been able to find yet - it is crucial since there is a db view that requires it
already included in the original q
.select_related?
.filter(category="kids")
.objects.order_by('brand','desc','colour') - don't know how to deal with SQL FIELDS
Any advice would be appreciated!

Here's how I would structure this.
First, I'm assuming your models for i and i_eta look something like this:
class I(models.Model):
mpn = models.CharField(max_length=30, primary_key=True)
year = models.CharField(max_length=30)
brand = models.CharField(max_length=30)
desc = models.CharField(max_length=100)
colour = models.CharField(max_length=30)
size = models.CharField(max_length=3)
class IEta(models.Model):
i = models.ForeignKey(I, on_delete=models.CASCADE)
eta = models.DateField()
General thoughts:
To write the coalesce in Django: I would not replace nulls with "Unknown" in the ORM. This is a presentation-layer concern: it should be dealt with in a template.
For date formatting, you can do date formatting in Python.
Not sure what a DB alias is.
For using multiple tables together, you can use either select_related(), prefetch_related(), or do nothing.
select_related() will perform a join.
prefect_related() will get the foreign key ID's from the first queryset, then generate a query like SELECT * FROM table WHERE id in (12, 13, 14).
Doing nothing will work automatically, but has the disadvantage of the SELECT N+1 problem.
I generally prefer prefetch_related().
For customizing the sort order of the size field, you have three options. My preference would be option 1, but any of the three will work.
Denormalize the sort criteria. Add a new field called size_numeric. Override the save() method to populate this field when saving new instances, giving xxl the value 1, xl the value 2, etc.
Sort in Python. Essentially, you use Python's built-in sorting methods to do the sort, rather than sorting it in the database. This doesn't work well if you have thousands of results.
Invoke the MySQL function. Essentially, using annotate(), you add the output of a function to the queryset. order_by() can sort by that function.

Solving a slow query with a Foreignkey that has isnull=False and order_by in a Django ListView

I have a Django ListView that allows to paginate through 'active' People.
The (simplified) models:
class Person(models.Model):
name = models.CharField()
# ...
active_schedule = models.ForeignKey('Schedule', related_name='+', null=True, on_delete=models.SET_NULL)
class Schedule(models.Model):
field = models.PositiveIntegerField(default=0)
# ...
person = models.ForeignKey(Person, related_name='schedules', on_delete=models.CASCADE)
The Person table contains almost 700.000 rows and the Schedule table contains just over 2.000.000 rows (on average every Person has 2-3 Schedule records, although many have none and a lot have more). For an 'active' Person, the active_schedule ForeignKey is set, of which there are about 5.000 at any time.
The ListView is supposed to show all active Person's, sorted by field on Schedule (and some other conditions, that don't seem to matter for this case).
The query then becomes:
Person.objects
.filter(active_schedule__isnull=False)
.select_related('active_schedule')
.order_by('active_schedule__field')
Specifically the order_by on the related field makes this query terribly slow (that is: it takes about a second, which is too slow for a web app).
I was hoping the filter condition would select the 5000 records, which then become relatively easily sortable. But when I run explain on this query, it shows that the (Postgres) database is messing with many more rows:
Gather Merge (cost=224316.51..290280.48 rows=565366 width=227)
Workers Planned: 2
-> Sort (cost=223316.49..224023.19 rows=282683 width=227)
Sort Key: exampledb_schedule.field
-> Parallel Hash Join (cost=89795.12..135883.20 rows=282683 width=227)
Hash Cond: (exampledb_person.active_schedule_id = exampledb_schedule.id)
-> Parallel Seq Scan on exampledb_person (cost=0.00..21263.03 rows=282683 width=161)
Filter: (active_schedule_id IS NOT NULL)
-> Parallel Hash (cost=67411.27..67411.27 rows=924228 width=66)
-> Parallel Seq Scan on exampledb_schedule (cost=0.00..67411.27 rows=924228 width=66)
I recently changed the models to be this way. In a previous version I had a model with just the ~5.000 active Person's in it. Doing the order_by on this small table was considerably faster! I am hoping to achieve the same speed with the current models.
I tried retrieving just the fields needed for the Listview (using values) which does help a little, but not much. I also tried setting the related_name on active_schedule and approaching the problem from Schedule, but that makes no difference. I tried putting a db_index on the Schedule.field, but that seems only to make things slower. Conditional queries also did not help (although I probably have not tried all possibilities). I'm at a loss.
The SQL statement generated by the ORM query:
SELECT
"exampledb_person"."id",
"exampledb_person"."name",
...
"exampledb_person"."active_schedule_id",
"exampledb_person"."created",
"exampledb_person"."updated",
"exampledb_schedule"."id",
"exampledb_schedule"."person_id",
"exampledb_schedule"."field",
...
"exampledb_schedule"."created",
"exampledb_schedule"."updated"
FROM
"exampledb_person"
INNER JOIN
"exampledb_schedule"
ON ("exampledb_person"."active_schedule_id" = "exampledb_schedule"."id")
WHERE
"exampledb_person"."active_schedule_id" IS NOT NULL
ORDER BY
"exampledb_schedule"."field" ASC
(Some fields were left out, for simplicity.)
Is it possible to speed up this query, or should I revert back to using a special Model for the active Person's?
EDIT: When I change the query, just for comparison/testing, to sort on an UNindexed field on Person, the query is equally show. However, if I then add an index to that field, the query is fast! I had to try this, as the SQL statement indeed shows that it's ordering on "exampledb_schedule"."field" - a field without index, but like I said: adding an index on the field makes no difference.
EDIT: I suppose it's also worth noting that when trying a much simpler sort query directly on Schedule, either on an indexed field or not, it's MUCH faster. For instance, for this test I've added an index to Schedule.field, then the following query is blazing fast:
Schedule.objects.order_by('field')
Somewhere in here lies the solution...

The comments by #guarav and my edits pointed me in the direction of the solution, which was staring in my face for a while...
The filter clause in my questions - filter(active_schedule__isnull=False) - seems to invalidate the database indexes. I wasn't aware of this, and had hoped a database expert would point me in this direction.
The solution is to filter on Schedule.field, which is 0 for inactive Person records and >0 for active ones:
Person.objects
.select_related('active_schedule')
.filter(active_schedule__field__gte=1)
.order_by('active_schedule__field')
This query properly uses the indexes and is fast (20ms opposed to ~1000ms).

Join two records from same model in django queryset

Been searching the web for a couple hours now looking for a solution but nothing quite fits what I am looking for.
I have one model (simplified):
class SimpleModel(Model):
name = CharField('Name', unique=True)
date = DateField()
amount = FloatField()
I have two dates; date_one and date_two.
I would like a single queryset with a row for each name in the Model, with each row showing:
{'name': name, 'date_one': date_one, 'date_two': date_two, 'amount_one': amount_one, 'amount_two': amount_two, 'change': amount_two - amount_one}
Reason being I would like to be able to find the rank of amount_one, amount_two, and change, using sort or filters on that single queryset.
I know I could create a list of dictionaries from two separate querysets then sort on that and get the ranks from the index values ...
but perhaps nievely I feel like there should be a DB solution using one queryset that would be faster.
union seemed promising but you cannot perform some simple operations like filter after that
I think I could perhaps split name into its own Model and generate queryset with related fields, but I'd prefer not to change the schema at this stage. Also, I only have access to sqlite.
appreciate any help!

Your current model forces you to have ONE name associated with ONE date and ONE amount. Because name is unique=True, you literally cannot have two dates associated with the same name
So if you want to be able to have several dates/amounts associated with a name, there are several ways to proceed
Idea 1: If there will only be 2 dates and 2 amounts, simply add a second date field and a second amount field
Idea 2: If there can be an infinite number of days and amounts, you'll have to change your model to reflect it, by having :
A model for your names
A model for your days and amounts, with a foreign key to your names
Idea 3: You could keep the same model and simply remove the unique constraint, but that's a recipe for mistakes
Based on your choice, you'll then have several ways of querying what you need. It depends on your final model structure. The best way to go would be to create custom model methods that query the 2 dates/amount, format an array and return it

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

I'm attempting to use Django to build a simple website. I have a set of blog posts that have a date field attached to indicate the day they were published. I have a table that contains a list of dates and temperatures. On each post, I would like to display the temperature on the day it was published.
The two models are as follows:
class Post(models.Model):
title = models.CharField(max_length=200)
text = models.TextField()
date = models.DateField()
class Temperature(models.Model):
date = models.DateField()
temperature = models.IntegerField()
I would like to be able to reference the temperature field from the second table using the date field from the first. Is this possible?
In SQL, this is a simple query. I would do the following:
Select temperature from Temperature t join Post p on t.date = p.date
I think I really have two questions:
Is it possible to brute force this, even if it's not best practice? I've googled a lot and tried using raw sql and objects.extra, but can't get them to do what I want. I'm also wary of relying on them for the long haul.
Since this seems to be a simple task, it seems likely that I'm overcomplicating it by having my models set up sub-optimally. Is there something I'm missing about how I should design my models? That is, what's the best practice for doing something like this? (I've successfully pulled the temperature into my blog post by using a foreign key in the Temperature model. But if I go that route, I don't see how I could easily make sure that my temperature dates get the correct foreign key assigned to them so that the temperature date maps to the correct post date.)

There will likely be better answers than this one, but I'll throw in my 2¢ anyway.
You could try a property inside the Post model that returns the temperature:
#property
def temperature(self):
try:
return Temperature.objects.values_list('temperature',flat=True).get(date=self.date)
except:
return None
(code not tested)

About your Models:
If you will be displaying the temperature in a Post list (a list of Posts with their temperatures), then maybe it will be simpler to code and a faster query to just add a temperature field to your Post model.
You can keep the Temperature model. Then:
Assuming you have the temperature data already present in you Temperature model at the time of Post instance creation, you can fill that new field in a custom save method.
If you get temperature data after Post creation, you cann fill in that new temperature field through a background job (maybe triggered by crontab or similar).
Sometimes database orthogonality (not repeating info in many tables) is not the best strategy. Just something to think about, depending on how often you will be querying the Post models and how simple you want to keep that query code.

I think this might be a basic approach to solve the problem
post_dates = Post.objects.all().values('date')
result_temprature = Temperature.objects.filter(date__in = post_dates).values('temperature')

Subqueries could be your friend here. Something like the following should work:
from django.db.models import OuterRef, Subquery
temps = Temperature.objects.filter(date=OuterRef('date'))
posts = Post.objects.annotate(temperature=Subquery(temps.values('temperature')[:1]))
for post in posts:
temperature = post.temperature
Then you can just iterate through posts and access the temperature off each post instance

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django ORM: dynamic columns from reference model in resultset - django

Related

Converting SQL to something that Django can use

Solving a slow query with a Foreignkey that has isnull=False and order_by in a Django ListView

Join two records from same model in django queryset

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

Django DB, finding Categories whose Items are all in a subset

Categories

Resources