For example, I have a model like this:
Class Doggy(models.Model):
name = models.CharField(u'Name', max_length = 40)
color = models.CharField(u'Color', max_length = 20)
How can i select doggies with the same color? Or with the same name :)
UPD. Of course, I don't know the name or the color. I want to.. kind of, group by their values.
UPD2. I'm trying to do something like that, but using Django:
SELECT *
FROM table
WHERE tablefield IN (
SELECT tablefield
FROM table
GROUP BY tablefield
HAVING (COUNT(tablefield ) > 1)
)
UPD3. I'd like to do it via Django ORM, without having to iterate over the objects. I just want to get rows with duplicate values for one particular field.
I'm late to the party, but here you go:
Doggy.objects.values('color', 'name').annotate(Count('pk'))
This will give you results that have a count of how many of each Doggy you have grouped by color and name.
If you're looking for Doggy's of a certain colour - you'd do something like.
Doggy.objects.filter(color='blue')
If you want to find Doggys based on the colour of the current Doggy
def GetSimilarColoredDoggys(self):
return Doggy.objects.filter(color=self.color)
The same would go for names:-
def GetDoggysWithSameName(self):
return Doggy.objects.filter(color=self.name)
You can use itertools.groupby() for this:
import operator
import itertools
from django.db import models
def group_model_by_attr(model_class, attr_name):
assert issubclass(model_class, models.Model), \
"%s is not a Django model." % (model_class,)
assert attr_name in [field.name for field in Event._meta.fields], \
"The %s field doesn't exist on model %s" % (attr_name, model_class)
all_instances = model_class.objects.all().order_by(attr_name)
keyfunc = operator.attrgetter(attr_name)
return [{k: list(g)} for k, g in itertools.groupby(all_instances, keyfunc)]
grouped_by_color = group_model_by_attr(Doggy, 'color')
grouped_by_name = group_model_by_attr(Doggy, 'name')
grouped_by_color (for example) will be a list of dicts like [{'purple': [doggy1, doggy2], {'pink': [doggy3,]}] where doggy1,2, etc. are Doggy instances.
UPDATE:
From your update it looks like you just want a list of ids for each event type. I tested this with 250k records in postgresql on my ubuntu laptop w/ a core 2 duo & 3gb of ram, and it took .35 seconds (the itertools.group_by took .72 seconds btw) to generate the dict. You mention that you have 900K records, so this should be fast enough. If it's not it should be easy to cache/update as the records change.
from collections import defaultdict
doggies = Doggy.objects.values_list('color', 'id').order_by('color').iterator()
grouped_doggies_by_color = defaultdict(list)
for color, id in doggies:
grouped_doggies_by_color[color].append(id)
I would change your data model so that the color and name are a one-to-many relationship with Doggy as follows:
class Doggy(models.Model):
name = models.ForeignKey('DoggyName')
color = models.ForeignKey('DoggyColor')
class DoggyName(models.Model):
name = models.CharField(max_length=40, unique=True)
class DoggyColor(models.Model):
color = models.CharField(max_length=20, unique=True)
Now DoggyName and DoggyColor do not contain duplicate names or colors, and you can use them to select dogs with the same name or color.
Okay, apparently, there's no way to do such thing with ORM only.
If you have to do it, you have to use .extra() to execute needed SQL-statement (if you are using SQL database, of course)
Related
I'm learning Django and looking for a best practice:
Imagine I have a model for a mobile phone device:
class Device(models.Model):
vendor = models.CharField(max_length=100)
line = models.CharField(max_length=100, blank=True)
model = models.CharField(max_length=100)
Let's say I create an object like this:
Device.objects.create(vendor = "Apple",
line = "iPhone",
model = "SE"
)
or without "line":
Device.objects.create(vendor = "Xiaomi",
model = "Mi 6"
)
Then I'd like to track sales in my shop for every device, so I create a model for a "Deal" (I track only the deal date and the device sold, device as a ForeignKey):
class Deal(models.Model):
device = models.ForeignKey(Device, on_delete=models.CASCADE)
deal_date = models.DateTimeField(default=None)
Question:
What is the best way to create a "Deal" object, if I want to query "Device" by its full, concatenated name, e.g. "Apple iPhone SE" or "Xiaomi Mi 6"?
I've found something similar in Django database entry created by concatenation of two fields , however not sure if it's the right path in my case.
My best guess is something like this (where "name" is a concatenated field):
de = Device.objects.get(name = "Apple iPhone SE")
Deal.objects.create(device = de,
deal_date = datetime(2018, 4, 26, 15, 28)
)
What is the correct way to do this task? Many thanks for your help!
Thanks for your advice guys, searching a little bit more I've found an answer appropriate in my case:
what I did is I tweaked save() method, which now populates a field automatically as a concatenation of 3 other fields.
#propertywas usefull in this case too
Supposing that you var name contains your text search criteria, and usign your data models, you could use annotation to create a field to each object returned by your query set, and then filter using this field
You could try some as follow (it is not tested)
import datetime
from django.db.models import F
from your.app.models import Deal, Device
# supposing is in your view, but like sounds as a model method
def my_view(request, *args, **kwargs)
name = request.POST.get('name')
device_qs = Decive.objects.all().annotate(text_concatenated='{} {} {}'.format(F('vendor'), F('line'), F('model'))).filter(text_concatenated=name)
try:
device = device_qs.get()
except Device.DoesNotExist:
# to manage the scenario when doesn't exist any match
# here you manage this situation
pass
except Device.MultipleObjectsReturned:
# to manage the scenario when various devices have the same 'text concatenated', i dont know, maybe data model should be improved
# here you manage this situation
device = device_qs.first()
deal = Deal.objects.create(device=device, deal_date=datetime.datetime.now())
# build your response and return it
I've got a search function in my app that receives "cities" and "duration" inputs (both lists) and returns the top 30 matching "package" results sorted by package "rating".
It would be easy to implement if all the parameters were columns, but "duration" and "rating" are calculated properties. This means that I can't use a standard Django query to filter the packages. It seems that Django's "extra" method is what I need to use here, but my SQL isn't great and this seems like a pretty complex query.
Is the extra method what I should be using here? If so, what would that statement look like?
Applicable code copied below.
#models.py
class City(models.Model):
...
city = models.CharField(max_length = 100)
class Package(models.Model):
....
city = models.ManyToManyField(City, through = 'PackageCity')
#property
def duration(self):
duration = len(Itinerary.objects.filter(package = self))
return duration
#property
def rating(self):
#do something to get the rating
return unicode(rating)
class PackageCity(models.Model):
package = models.ForeignKey(Package)
city = models.ForeignKey(City)
class Itinerary(models.Model):
# An Itinerary object is a day in a package, so len(Itinerary) works for the duration
...
package = models.ForeignKey(Package)
#functions.py
def get_packages(city, duration):
cities = City.objects.filter(city = city) # works fine
duration_list = range(int(duration_array[0], 10), int(duration_array[1], 10) + 1) # works fine
#What I want to do, but can't because duration & rating are calculated properties
packages = Package.objects.filter(city__in = cities, duration__in = duration_array).order_by('rating')[:30]
First off, don't use len() on Querysets, use count().
https://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
Second, assuming you're doing something like calculating an average rating with your rating property you could use annotate:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#annotate
Then you can do something like the following:
queryset = Package.objects.annotate({'duration': Count('related-name-for-itinerary', distinct=True), 'rating': Avg('packagereview__rating')})
Where "PackageReview" is a fake model I just made that has a ForeignKey to Package, and has a "rating" field.
Then you can filter the annotated queryset as described here:
https://docs.djangoproject.com/en/dev/topics/db/aggregation/#filtering-on-annotations
(Take note of the clause order differences between annotate -> filter, and filter -> annotate.
Properties are calculated at run time, so you really can't use them for filtering or anything like that.
I have an application where users select their own display columns. Each display column has a specified formula. To compute that formula, I need to join few related columns (one-to-one relationship) and compute the value.
The models are like (this is just an example model, actual has more than 100 fields):
class CompanyCode(models.Model):
"""Various Company Codes"""
nse_code = models.CharField(max_length=20)
bse_code = models.CharField(max_length=20)
isin_code = models.CharField(max_length=20)
class Quarter(models.Model):
"""Company Quarterly Result Figures"""
company_code = models.OneToOneField(CompanyCode)
sales_now = models.IntegerField()
sales_previous = models.IntegerField()
I tried doing:
ratios = {'growth':'quarter__sales_now / quarter__sales_previous'}
CompanyCode.objects.extra(select=ratios)
# raises "Unknown column 'quarter__sales_now' in 'field list'"
I also tried using raw query:
query = ','.join(['round((%s),2) AS %s' % (formula, ratio_name)
for ratio_name, formula in ratios.iteritems()])
companies = CompanyCode.objects.raw("""
SELECT `backend_companycode`.`id`, %s
FROM `backend_companycode`
INNER JOIN `backend_quarter` ON ( `backend_companycode`.`id` = `backend_companyquarter`.`company_code_id` )
""", [query])
#This just gives empty result
So please give me a little clue as to how I can use related columns preferably using 'extra' command. Thanks.
By now the Django documentation says that one should use extra as a last resort.
So here is a query without extra():
from django.db.models import F
CompanyCode.objects.annotate(
growth=F('quarter__sales_now') / F('quarter__sales_previous'),
)
Since the calculation is being done on a single Quarter instance, where's the need to do it in the SELECT? You could just define a ratio method/property on the Quarter model:
#property
def quarter(self):
return self.sales_now / self.sales_previous
and call it where necessary
Ok, I found it out. In above using:
CompanyCode.objects.select_related('quarter').extra(select=ratios)
solved the problem.
Basically, to access any related model data through 'extra', we just need to ensure that that model is joined in our query. Using select_related, the query automatically joins the mentioned models.
Thanks :).
I have a third party Django App (Satchmo) which has a model called Product which I make extensive use of in my Django site.
I want to add the ability to search for products via color. So I have created a new model called ProductColor. This model looks roughly like this...
class ProductColor(models.Model):
products = models.ManyToManyField(Product)
r = models.IntegerField()
g = models.IntegerField()
b = models.IntegerField()
name = models.CharField(max_length=32)
When a store product's data is loaded into the site, the product's color data is used to create a ProductColor object which will point to that Product object.The plan is to allow a user to search for a product by searching a color range.
I can't seem to figure out how to put this query into a QuerySet. I can make this...
# If the color ranges look something like this...
r_range, g_range, b_range = ((3,130),(0,255),(0,255))
# Then my query looks like
colors_in_range = ProductColor.objects.select_related('products')
if r_range:
colors_in_range = colors_in_range.filter(
Q(r__gte=r_range[0])
| Q(r__lte=r_range[1])
)
if g_range:
colors_in_range = colors_in_range.filter(
Q(g__gte=g_range[0])
| Q(g__lte=g_range[1])
)
if b_range:
colors_in_range = colors_in_range.filter(
Q(b__gte=b_range[0])
| Q(b__lte=b_range[1])
)
So I end up with a QuerySet which contains all of the ProductColor objects in that color range. I could then build a list of Products by accessing the products ManyToMany attribute of each ProductColor attribute.
What I really need is a valid QuerySet of Products. This is because there is going to be other logic which is performed on these results and it needs to operate on a QuerySet object.
So my question is how can I build the QuerySet that I really want? And failing that, is there an efficient way to re-build the QuerySet (preferably without hitting the database again)?
If you want to get a Product queryset you have to filter the Product objects and filter via the reverse relation for product color:
products = Product.objects.filter(productcolor_set__r__gte=x).distinct()
You can use the range field lookup:
You can use range anywhere you can use
BETWEEN in SQL -- for dates, numbers
and even characters.
your query:
r_range, g_range, b_range = ((3,130),(0,255),(0,255))
products = Product.objects.filter(productcolor_set__r__range=r_range,
productcolor_set__g__range=g_range,
productcolor_set__b__range=b_range).distinct()
I have a model that has an id that isn't unique. Each model also has a date. I would like to return all results but only the most recent of each row that shares ids. The model looks something like this:
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
## Add some entries
m1 = MyModel(my_id=1, date=yesterday, title='stop')
m1.save()
m2 = MyModel(my_id=1, date=today, title='go')
m2.save()
m3 = MyModel(my_id=2, date=today, title='hello')
m3.save()
Now try to retrieve these results:
MyModel.objects.all()... # then limit duplicate my_id's by most recent
Results should be only m2 and m3
You won't be able to do this with just the ORM, you'll need to get all the records, and then discard the duplicates in Python.
For example:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
Here's some custom SQL that can get what you want from the database:
select * from mymodel where (id, date) in (select id, max(date) from mymodel group by id)
You should be able to adapt this to use in the ORM.
You should also look into abstracting the logic above into a manager:
http://docs.djangoproject.com/en/dev/topics/db/managers/
That way you can call something like MyModel.objects.no_dupes() where you would define no_dupes() in a manager and do the logic Ned laid out in there.
Your models.py would now look like this:
class MyModelManager(models.Manager):
def no_dupes:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
return keep
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
objects = MyModelManager()
With the above code in place, you can call: MyModel.objects.no_dupes(), this should give your desired result. Looks like you can even override the all() function as well if you would want that instead:
http://docs.djangoproject.com/en/1.2/topics/db/managers/#modifying-initial-manager-querysets
I find the manager to be a better solution in case you will need to use this in more than one view across the project, this way you don't have to rewrite the code X number of times.
As Ned says, I don't know of a way to do this with the ORM. But you might be able to use the db to restrict the amount of work you have to do in the for loop in python.
The idea is to use Django's annotate (which is basically running group_by) to find all the instances that have more than one row with the same my_id and process them as Ned suggests. Then for the remainder (which have no duplicates), you can just grab the individual rows.
from django.db.models import Count, Q
annotated_qs = MyModel.objects.annotate(num_my_ids=Count('my_id')).order_by('-date')
dupes = annotated_qs.filter(num_my_ids__gt=1)
uniques = annotated_qs.filter(num_my_ids__lte=1)
for dupe in dupes:
... # just keep the most recent, as Ned describes
keep_ids = [keep.id for keep in keeps]
latests = MyModel.objects.filter(Q(id__in=keep_ids) | Q(id__in=uniques))
If you only have a small number of dupes, this will mean that your for loop is much shorter, at the expense of an extra query (to get the dupes).