I have an application where users select their own display columns. Each display column has a specified formula. To compute that formula, I need to join few related columns (one-to-one relationship) and compute the value.
The models are like (this is just an example model, actual has more than 100 fields):
class CompanyCode(models.Model):
"""Various Company Codes"""
nse_code = models.CharField(max_length=20)
bse_code = models.CharField(max_length=20)
isin_code = models.CharField(max_length=20)
class Quarter(models.Model):
"""Company Quarterly Result Figures"""
company_code = models.OneToOneField(CompanyCode)
sales_now = models.IntegerField()
sales_previous = models.IntegerField()
I tried doing:
ratios = {'growth':'quarter__sales_now / quarter__sales_previous'}
CompanyCode.objects.extra(select=ratios)
# raises "Unknown column 'quarter__sales_now' in 'field list'"
I also tried using raw query:
query = ','.join(['round((%s),2) AS %s' % (formula, ratio_name)
for ratio_name, formula in ratios.iteritems()])
companies = CompanyCode.objects.raw("""
SELECT `backend_companycode`.`id`, %s
FROM `backend_companycode`
INNER JOIN `backend_quarter` ON ( `backend_companycode`.`id` = `backend_companyquarter`.`company_code_id` )
""", [query])
#This just gives empty result
So please give me a little clue as to how I can use related columns preferably using 'extra' command. Thanks.
By now the Django documentation says that one should use extra as a last resort.
So here is a query without extra():
from django.db.models import F
CompanyCode.objects.annotate(
growth=F('quarter__sales_now') / F('quarter__sales_previous'),
)
Since the calculation is being done on a single Quarter instance, where's the need to do it in the SELECT? You could just define a ratio method/property on the Quarter model:
#property
def quarter(self):
return self.sales_now / self.sales_previous
and call it where necessary
Ok, I found it out. In above using:
CompanyCode.objects.select_related('quarter').extra(select=ratios)
solved the problem.
Basically, to access any related model data through 'extra', we just need to ensure that that model is joined in our query. Using select_related, the query automatically joins the mentioned models.
Thanks :).
Related
I am facing one issue while filtering the data .
I am having three models ...
class Product:
size = models.CharField(max_length=200)
class Make(models.Model):
name = models.ForeignKey(Product, related_name='product_set')
class MakeContent(models.Model):
make = models.ForeignKey(Make, related_name='make_set')
published = models.BooleanField()
I can generate a queryset that contains all Makes and each one's related MakeContents where published = True.
Make.objects.filter(make_set__published=True)
I'd like to know if it's possible (without writing SQL directly) for me to generate a queryset that contains all Product and each one's related MakeContents where published = True.
I have tried this
Product.objects.filter(product_set__make_set__published=True)
But it's not working
A subquery can solve the problem.
from django.db.models import Subquery
sub_query = MakeContent.objects.filter(published=True)
Product.objects.filter(
pk__in=Subquery(sub_query.values('make__name__pk'))
)
Use this
Product.objects.filter(make__makecontent__published=True).distinct()
Filtering through ForeignKey works both forward and backwards in Django. You can use the model name in lower case for backward filtering.
https://docs.djangoproject.com/en/3.1/topics/db/queries/#lookups-that-span-relationships
Finally .distinct() is required to remove the multiple values produced due to SQL joins
I have an Article model like this
from django.contrib.contenttypes.fields import GenericRelation
from django.db import models
from hitcount.models import HitCountMixin, HitCount
class Article(models.Model):
title = models.CharField(max_length=250)
hit_count_generic = GenericRelation(
HitCount, object_id_field='object_pk',
related_query_name='hit_count_generic_relation')
when I do Article.objects.order_by('hit_count_generic__hits'), I am getting results.but when I do
articles_by_id = Article.objects.filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
articles = articles_by_id.union(articles_by_name).order_by('qs_order', 'hit_count_generic__hits')
getting error
ORDER BY term does not match any column in the result set
How can i achieve union like this? I had to use union instead of AND and OR because i need to preserve order. ie; articles_by_id should come first and articles_by_name should come second.
using Django hitcount for hitcount https://github.com/thornomad/django-hitcount. Hitcount model is given below.
class HitCount(models.Model):
"""
Model that stores the hit totals for any content object.
"""
hits = models.PositiveIntegerField(default=0)
modified = models.DateTimeField(auto_now=True)
content_type = models.ForeignKey(
ContentType, related_name="content_type_set_for_%(class)s", on_delete=models.CASCADE)
object_pk = models.TextField('object ID')
content_object = GenericForeignKey('content_type', 'object_pk')
objects = HitCountManager()
As suggested by #Angela tried prefetch related.
articles_by_id = Article.objects.prefetch_related('hit_count_generic').filter(id__in=[1, 2, 3]).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.prefetch_related('hit_count_generic').filter(title__icontains='date').annotate(qs_order=models.Value(1, models.IntegerField()))
the query of the prefetch_related when checked is not selecting the hitcount at all see.
SELECT "articles_article"."id", "articles_article"."created", "articles_article"."last_changed_date", "articles_article"."title", "articles_article"."title_en", "articles_article"."slug", "articles_article"."status", "articles_article"."number_of_comments", "articles_article"."number_of_likes", "articles_article"."publish_date", "articles_article"."short_description", "articles_article"."description", "articles_article"."cover_image", "articles_article"."page_title", "articles_article"."category_id", "articles_article"."author_id", "articles_article"."creator_id", "articles_article"."article_type", 0 AS "qs_order" FROM "articles_article" WHERE "articles_article"."id" IN (1, 2, 3)
From Django's official documentation:
Further, databases place restrictions on what operations are allowed in the combined queries. For example, most databases don’t allow LIMIT or OFFSET in the combined queries.
So, make sure that your database allows combining queries like this.
ORDER BY term does not match any column in the result set
You are getting this error, because that's exactly what's happening. Your final result-set for articles does not contain the hits column from the hitcount table , due to which the result-set cannot order using this column.
Before delving into the answer, let's look at what's happening with your django querysets under the hood.
Retrieve a particular set of articles and include an extra ordering field qs_order set to 0.
articles_by_id = Article.objects.filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
SQL Query for the above
Select id, title,....., 0 as qs_order from article where article.id in (Select ....) # whatever you did to get your ids or just a flat list
Retrieve another set of articles and include an extra ordering field qs_order set to 1
articles_by_name = Article.objects.filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
SQL Query for the above
Select id, title, ...1 as qs_order from article where title ilike '%sports%'
Original queryset and order_by hit_count_generic__hits
Article.objects.order_by('hit_count_generic__hits')
This will actually perform an inner join and fetch the hitcount table to order by the hits column.
Query
Select id, title,... from article inner join hitcount on ... order by hits ASC
Union
So when you do your union, the result-set of the above 2 queries is combined and then ordered using your qs_order and then hits ...where it fails.
Solution
Use prefetch_related to get your hitcount table in the initial queryset filtering, so you can then use the hits column in the union to order.
articles_by_id = Article.objects.prefetch_related('hit_count_generic').filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.prefetch_related('hit_count_generic').filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
Now as you have the desired table and its columns in both your SELECT queries, your union should work the way you have defined.
articles = articles_by_id.union(articles_by_name).order_by('qs_order', 'hit_count_generic__hits')
Just replacing prefetch_related with select_related works for me.
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#select-related
I am trying to add an additional custom field to a django model. I have been having quite a hard time figuring out how to do the following, and I will be awarding a 150pt bounty for the first fully correct answer when it becomes available (after it is available -- see as a reference Improving Python/django view code).
I have the following model, with a custom def that returns a video count for each user --
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField('Position', through ='PositionTimestamp', blank=True)
def count(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"""SELECT (
SELECT COUNT(*)
FROM videos_video v
WHERE v.uploaded_by_id = p.id
OR EXISTS (
SELECT NULL
FROM videos_videocredit c
WHERE c.video_id = v.id
AND c.profile_id = p.id
)
) AS Total_credits
FROM userprofile_userprofile p
WHERE p.id = %d"""%(int(self.pk))
)
return int(cursor.fetchone()[0])
I want to be able to order by the count, i.e., UserProfile.objects.order_by('count'). Of course, I can't do that, which is why I'm asking this question.
Previously, I tried adding a custom model Manager, but the problem with that was I also need to be able to filter by various criteria of the UserProfile model: Specifically, I need to be able to do: UserProfile.objects.filter(positions=x).order_by('count'). In addition, I need to stay in the ORM (cannot have a raw sql output) and I do not want to put the filtering logic into the SQL, because there are various filters, and would require several statements.
How exactly would I do this? Thank you.
My reaction is that you're trying to take a bigger bite than you can chew. Break it into bite size pieces by giving yourself more primitives to work with.
You want to create these two pieces separately so you can call on them:
Does this user get credit for this video? return boolean
For how many videos does this user get credit? return int
Then use a combination of #property, model managers, querysets, and methods that make it easiest to express what you need.
For example you might attach the "credit" to the video model taking a user parameter, or the user model taking a video parameter, or a "credit" manager on users which adds a count of videos for which they have credit.
It's not trivial, but shouldn't be too tricky if you work for it.
"couldn't you use something like the "extra" queryset modifier?"
see the docs
I didn't put this in an answer at first because I wasn't sure it would actually work or if it was what you needed - it was more like a nudge in the (hopefully) right direction.
in the docs on that page there is an example
query
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
)
resulting sql
SELECT blog_blog.*, (SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id) AS entry_count
FROM blog_blog;
Perhaps doing something like that and accessing the user id which you currently have as p.id as appname_userprofile.id
note:
Im just winging it so try to play around a bit.
perhaps use the shell to output the query as sql and see what you are getting.
models:
class Positions(models.Model):
x = models.IntegerField()
class Meta:
db_table = 'xtest_positions'
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField(Positions)
class Meta:
db_table = 'xtest_users'
class Video(models.Model):
usr = models.ForeignKey(UserProfile)
views = models.IntegerField()
class Meta:
db_table = 'xtest_video'
result:
test = UserProfile.objects.annotate(video_views=Sum('video__views')).order_by('video_views')
for t in test:
print t.video_views
doc: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
This is either what you want, or I've completely misunderstood!.. Anywhoo... Hope it helps!
I am using Django. I am having a few issues with caching of QuerySets for news/category models:
class Category(models.Model):
title = models.CharField(max_length=60)
slug = models.SlugField(unique=True)
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.filter(published__lte=datetime.datetime.now())
class Article(models.Model):
category = models.ForeignKey(Category)
title = models.CharField(max_length=60)
slug = models.SlugField(unique = True)
story = models.TextField()
author = models.CharField(max_length=60, blank=True)
published = models.DateTimeField(
help_text=_('Set to a date in the future to publish later.'))
created = models.DateTimeField(auto_now_add=True, editable=False)
updated = models.DateTimeField(auto_now=True, editable=False)
live = PublishedArticlesManager()
objects = models.Manager()
Note - I have removed some fields to save on complexity...
There are a few (related) issues with the above.
Firstly, when I query for LIVE objects in my view via Article.live.all() if I refresh the page repeatedly I can see (in MYSQL logs) the same database query being made with exactly the same date in the where clause - ie - the datetime.datetime.now() is being evaluated at compile time rather than runtime. I need the date to be evaluated at runtime.
Secondly, when I use the articles_set method on the Category object this appears to work correctly - the datetime used in the query changes each time the query is run - again I can see this in the logs. However, I am not quite sure why this works, since I don't have anything in my code to say that the articles_set query should return LIVE entries only!?
Finally, why is none of this being cached?
Any ideas how to make the correct time be used consistently? Can someone please explain why the latter setup appears to work?
Thanks
Jay
P.S - database queries below, note the date variations.
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:21:33' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:26:06' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
You should check out conditional view processing.
def latest_entry(request, article_id):
return Article.objects.latest("updated").updated
#conditional(last_modified_func=latest_entry)
def view_article(request, article_id)
your view code here
This should cache the page rather than reloading a new version every time.
I suspect that if you want the now() to be processed at runtime, you should do use raw sql. I think this will solve the compile/runtime issue.
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.raw("SELECT * FROM news_article WHERE published <= CURRENT_TIMESTAMP")
Note that this returns a RawQuerySet which may differ a bit from a normal QuerySet
I have now fixed this issue. It appears the problem was that the queryset returned by Article.live.all() was being cached in my urls.py! I was using function-based generic-views:
url(r'^all/$', object_list, {
'queryset' : Article.live.all(),
}, 'news_all'),
I have now changed this to use the class-based approach, as advised in the latest Django documentation:
url(r'^all/$', ListView.as_view(
model=Article,
), name="news_all"),
This now works as expected - by specifying the model attribute rather than the queryset attribute the query is QuerySet is created at compile-time instead of runtime.
For example, I have a model like this:
Class Doggy(models.Model):
name = models.CharField(u'Name', max_length = 40)
color = models.CharField(u'Color', max_length = 20)
How can i select doggies with the same color? Or with the same name :)
UPD. Of course, I don't know the name or the color. I want to.. kind of, group by their values.
UPD2. I'm trying to do something like that, but using Django:
SELECT *
FROM table
WHERE tablefield IN (
SELECT tablefield
FROM table
GROUP BY tablefield
HAVING (COUNT(tablefield ) > 1)
)
UPD3. I'd like to do it via Django ORM, without having to iterate over the objects. I just want to get rows with duplicate values for one particular field.
I'm late to the party, but here you go:
Doggy.objects.values('color', 'name').annotate(Count('pk'))
This will give you results that have a count of how many of each Doggy you have grouped by color and name.
If you're looking for Doggy's of a certain colour - you'd do something like.
Doggy.objects.filter(color='blue')
If you want to find Doggys based on the colour of the current Doggy
def GetSimilarColoredDoggys(self):
return Doggy.objects.filter(color=self.color)
The same would go for names:-
def GetDoggysWithSameName(self):
return Doggy.objects.filter(color=self.name)
You can use itertools.groupby() for this:
import operator
import itertools
from django.db import models
def group_model_by_attr(model_class, attr_name):
assert issubclass(model_class, models.Model), \
"%s is not a Django model." % (model_class,)
assert attr_name in [field.name for field in Event._meta.fields], \
"The %s field doesn't exist on model %s" % (attr_name, model_class)
all_instances = model_class.objects.all().order_by(attr_name)
keyfunc = operator.attrgetter(attr_name)
return [{k: list(g)} for k, g in itertools.groupby(all_instances, keyfunc)]
grouped_by_color = group_model_by_attr(Doggy, 'color')
grouped_by_name = group_model_by_attr(Doggy, 'name')
grouped_by_color (for example) will be a list of dicts like [{'purple': [doggy1, doggy2], {'pink': [doggy3,]}] where doggy1,2, etc. are Doggy instances.
UPDATE:
From your update it looks like you just want a list of ids for each event type. I tested this with 250k records in postgresql on my ubuntu laptop w/ a core 2 duo & 3gb of ram, and it took .35 seconds (the itertools.group_by took .72 seconds btw) to generate the dict. You mention that you have 900K records, so this should be fast enough. If it's not it should be easy to cache/update as the records change.
from collections import defaultdict
doggies = Doggy.objects.values_list('color', 'id').order_by('color').iterator()
grouped_doggies_by_color = defaultdict(list)
for color, id in doggies:
grouped_doggies_by_color[color].append(id)
I would change your data model so that the color and name are a one-to-many relationship with Doggy as follows:
class Doggy(models.Model):
name = models.ForeignKey('DoggyName')
color = models.ForeignKey('DoggyColor')
class DoggyName(models.Model):
name = models.CharField(max_length=40, unique=True)
class DoggyColor(models.Model):
color = models.CharField(max_length=20, unique=True)
Now DoggyName and DoggyColor do not contain duplicate names or colors, and you can use them to select dogs with the same name or color.
Okay, apparently, there's no way to do such thing with ORM only.
If you have to do it, you have to use .extra() to execute needed SQL-statement (if you are using SQL database, of course)