Django - how to filter a queryset on multiple reverse lookup matches - django

I have two models: Order and OrderStatus.
Don't worry about Order, but OrderStatus has the following fields:
order = models.ForiegnKey(Order)
status = models.CharField (choice that can be either ORDERED, IN_TRANSIT, or RECEIVED)
OrderStatuses are created when the Order changes status, so initially there's just an ORDERED status, then later an ORDERED and IN_TRANSIT status, then later an ORDERED, IN_TRANSIT, and RECEIVED status all exist as foriegn keys to one Order. This is to keep track of timings, etc.
I want to find all Orders which have all three statuses. In other words, all orders that have been received and are valid because they have the other two statuses.
This is returning an empty set:
Order.objects.filter(Q(orderstatus__status=OrderStatus.ORDERED) &
Q(orderstatus__status=OrderStatus.IN_TRANSIT) &
Q(orderstatus__status=OrderStatus.RECEIVED))):
... but this is working fine:
Order.objects.filter(orderstatus__status=OrderStatus.ORDERED)
.filter(orderstatus__status=OrderStatus.IN_TRANSIT)
.filter(orderstatus__status=OrderStatus.RECEIVED)
What's the difference here? Is there any way to simplify? I thought this was what Q objects are for.

This means a query where all the fields are required
Order.objects.filter(Q(orderstatus__status=OrderStatus.ORDERED) &
Q(orderstatus__status=OrderStatus.IN_TRANSIT) &
Q(orderstatus__status=OrderStatus.RECEIVED))):
This means that the third filter is applying on the result of second filter and the second filter is applying on the result of first filter
Order.objects.filter(orderstatus__status=OrderStatus.ORDERED)
.filter(orderstatus__status=OrderStatus.IN_TRANSIT)
.filter(orderstatus__status=OrderStatus.RECEIVED)
If you want to do something where you want to get the Order objects if their status is ORDERED, RECEIVED OR IN_TRANSIT you can also do something like this
Order.objects.filter(orderstatus__status__in=[OrderStatus.ORDERED, OrderStatus.IN_TRANSIT, OrderStatus.RECEIVED])

Related

RavenDB map reduce, duplicate entries in reduce

I have created a map function in Raven that looks like this
from order in docs.WebOrderModels
from orderLine in order.OrderLines
where order.OrderStatus.OrderStatusId == 3
select new{
orderLine.Sku,
orderLine.Quantity
}
together with the following reduce
from result in results
group result by new {result.Sku, result.Quantity} into g
select new{
Sku = g.Key.Sku,
Quantity = g.Sum(x => x.Quantity)
}
Running this mostly work, except that I get dupliacate entries for the Sku, See the image:
The same Sku number appears two times.
When I look through the data there does not seem to be any difference other than the quantities per order object.
I have tried to make two new order objects to see if happens when two order objects contains orderlines for the same sku number. But they are added together as I would expect.
I can't find any reason why the two entries are not reduced to one entry.
You are grouping the result with:
group result by new {result.Sku, result.Quantity} into g
which will give you result entries per different (Sku & Quantity) pairs
Use
group result by result.Sku
See:
https://demo.ravendb.net/demos/csharp/static-indexes/map-reduce-index#step-4

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

django - queryset.last() not returning right order / last record

Writing code that is generating JSON. The last section of JSON has to be terminated by a ",", so in the code I have:
-- Define a queryset to retrieve distinct values of the database field:
databases_in_workload = DatabaseObjectsWorkload.objects.filter(workload=migration.workload_id).values_list('database_object__database', flat=True).distinct()
-- Then I cycle over it:
for database_wk in databases_in_workload:
... do something
if not (database_wk == databases_in_workload.last()):
job_json_string = job_json_string + '} ],'
else:
job_json_string = job_json_string + '} ]'
I want the last record to be terminated by a square bracket, the preceding by a comma. But instead, the opposite is happening.
I also looked at the database table content. The values I have for "database_wk" are user02 (for the records with a lower value of primary key) and user01 (for the records with the higher value of pk in the DB). The order (if user01 is first or last) really doesn't matter, as long as the last record is correctly identified by last() - so if I have user02, user01 in the query set iterations, I expect last() to return user01. However - this is not working correctly.
What is strange is that if in the database (Postgres) order is changed (first have user01, then user02 ordered by primary key values) then the "if" code above works, but in my situation last() seems to be returning the first record, not the last. It's as if there is one order in the database, another in the query set, and last() is taking the database order... Anybody encountered/solved this issue before? Alternatively - any other method for identifying the last record in a query set (other than last()) which I could try would also help. Many thanks in advance!
The reason is behaving the way it does is because there is no ordering specified. Try using order_by. REF
From: queryset.first()
If the QuerySet has no ordering defined, then the queryset is automatically ordered by the primary key
From: queryset.last()
Works like first(), but returns the last object in the queryset.
If you don't want to use order_by then try using queryset.latest()

How to create a conversation inbox in Django

I have a Message class which has fromUser, toUser, text and createdAt fields.
I want to imitate a whatsapp or iMessage or any SMS inbox, meaning I want to fetch the last message for each conversation.
I tried:
messages = Message.objects.order_by('createdAt').distinct('fromUser', 'toUser')
But this doesn't work because of SELECT DISTINCT ON expressions must match initial ORDER BY expressions error.
I don't really understand what it means, I also tried:
messages = Message.objects.order_by('fromUser','toUser','createdAt').distinct('fromUser', 'toUser')
and such but let me not blur the real topic here with apparently meaningless code pieces. How can I achieve this basic or better said, general well-known, result?
Your second method is correct. From the Django docs:
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
For example, SELECT DISTINCT ON (a) gives you the first row for each value in column a. If you don’t specify an order, you’ll get some arbitrary row.
This means that you must include the same columns in your order_by() method that you want to use in the distinct() method. Indeed, your second query correctly includes the columns in the order_by() method:
messages = Message.objects.order_by('fromUser','toUser','createdAt').distinct('fromUser', 'toUser')
In order to fetch the latest record, you need to order the createdAt column by descending order. The way to specify this order is to include a minus sign on the column name in the order_by() method (there is an example of this in the docs here). Here's the final form that you should use to get your list of messages in latest-first order:
messages = Message.objects.order_by('fromUser','toUser','-createdAt').distinct('fromUser', 'toUser')

Remove duplicates in Django ORM -- multiple rows

I have a model that has four fields. How do I remove duplicate objects from my database?
Daniel Roseman's answer to this question seems appropriate, but I'm not sure how to extend this to situation where there are four fields to compare per object.
Thanks,
W.
def remove_duplicated_records(model, fields):
"""
Removes records from `model` duplicated on `fields`
while leaving the most recent one (biggest `id`).
"""
duplicates = model.objects.values(*fields)
# override any model specific ordering (for `.annotate()`)
duplicates = duplicates.order_by()
# group by same values of `fields`; count how many rows are the same
duplicates = duplicates.annotate(
max_id=models.Max("id"), count_id=models.Count("id")
)
# leave out only the ones which are actually duplicated
duplicates = duplicates.filter(count_id__gt=1)
for duplicate in duplicates:
to_delete = model.objects.filter(**{x: duplicate[x] for x in fields})
# leave out the latest duplicated record
# you can use `Min` if you wish to leave out the first record
to_delete = to_delete.exclude(id=duplicate["max_id"])
to_delete.delete()
You shouldn't do it often. Use unique_together constraints on database instead.
This leaves the record with the biggest id in the DB. If you want to keep the original record (first one), modify the code a bit with models.Min. You can also use completely different field, like creation date or something.
Underlying SQL
When annotating django ORM uses GROUP BY statement on all model fields used in the query. Thus the use of .values() method. GROUP BY will group all records having those values identical. The duplicated ones (more than one id for unique_fields) are later filtered out in HAVING statement generated by .filter() on annotated QuerySet.
SELECT
field_1,
…
field_n,
MAX(id) as max_id,
COUNT(id) as count_id
FROM
app_mymodel
GROUP BY
field_1,
…
field_n
HAVING
count_id > 1
The duplicated records are later deleted in the for loop with an exception to the most frequent one for each group.
Empty .order_by()
Just to be sure, it's always wise to add an empty .order_by() call before aggregating a QuerySet.
The fields used for ordering the QuerySet are also included in GROUP BY statement. Empty .order_by() overrides columns declared in model's Meta and in result they're not included in the SQL query (e.g. default sorting by date can ruin the results).
You might not need to override it at the current moment, but someone might add default ordering later and therefore ruin your precious delete-duplicates code not even knowing that. Yes, I'm sure you have 100% test coverage…
Just add empty .order_by() to be safe. ;-)
https://docs.djangoproject.com/en/3.2/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Transaction
Of course you should consider doing it all in a single transaction.
https://docs.djangoproject.com/en/3.2/topics/db/transactions/#django.db.transaction.atomic
If you want to delete duplicates on single or multiple columns, you don't need to iterate over millions of records.
Fetch all unique columns (don't forget to include the primary key column)
fetch = Model.objects.all().values("id", "skuid", "review", "date_time")
Read the result using pandas (I did using pandas instead ORM query)
import pandas as pd
df = pd.DataFrame.from_dict(fetch)
Drop duplicates on unique columns
uniq_df = df.drop_duplicates(subset=["skuid", "review", "date_time"])
## Dont add primary key in subset you dumb
Now, you'll get the unique records from where you can pick the primary key
primary_keys = uniq_df["id"].tolist()
Finally, it's show time (exclude those id's from records and delete rest of the data)
records = Model.objects.all().exclude(pk__in=primary_keys).delete()