Retrieving untagged objects with django-tagging

Retrieving untagged objects with django-tagging - django

What I'm looking for is a QuerySet containing any objects not tagged.
The solution I've come up with so far looks overly complicated to me:
# Get all tags for model
tags = Location.tags.all().order_by('name')
# Get a list of tagged location id's
tag_list = tags.values_list('name', flat=True)
tag_names = ', '.join(tag_list)
tagged_locations = Location.tagged.with_any(tag_names) \
.values_list('id', flat=True)
untagged_locations = []
for location in Location.objects.all():
if location.id not in tagged_locations:
untagged_locations.append(location)
Any ideas for improvement? Thanks!

There is some good information in this post, so I don't feel that it should be deleted, but there is a much, much simpler solution
I took a quick peek at the source code for django-tagging. It looks like they use the ContentType framework and generic relations to pull it off.
Because of this, you should be able to create a generic reverse relation on your Location class to get easy access to the TaggedItem objects for a given location, if you haven't already done so:
from django.contrib.contenttypes import generic
from tagging.models import TaggedItem
class Location(models.Model):
...
tagged_items = generic.GenericRelation(TaggedItem,
object_id_field="object_id",
content_type_field="content_type")
...
Clarification
My original answer suggested to do this:
untagged_locs = Location.objects.filter(tagged_items__isnull=True)
Although this would work for a 'normal join', this actually doesn't work here because the content type framework throws an additional check on content_type_id into the SQL for isnull:
SELECT [snip] FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
WHERE (`tagging_taggeditem`.`id` IS NULL
AND `tagging_taggeditem`.`content_type_id` = 4 )
You can hack-around it by reversing it like this:
untagged_locs = Location.objects.exclude(tagged_items__isnull=False)
But that doesn't quite feel right.
I also proposed this, but it was pointed out that annotations don't work as expected with the content types framework.
from django.db.models import Count
untagged_locs = Location.objects.annotate(
num_tags=Count('tagged_items')).filter(num_tags=0)
The above code works for me in my limited test case, but it could be buggy if you have other 'taggable' objects in your model. The reason being that it doesn't check the content_type_id as outlined in the ticket. It generated the following SQL:
SELECT [snip], COUNT(`tagging_taggeditem`.`id`) AS `num_tags`
FROM `sotest_location`
LEFT OUTER JOIN `tagging_taggeditem`
ON (`sotest_location`.`id` = `tagging_taggeditem`.`object_id`)
GROUP BY `sotest_location`.`id` HAVING COUNT(`tagging_taggeditem`.`id`) = 0
ORDER BY NULL
If Location is your only taggable object, then the above would work.
Proposed Workaround
Short of getting the annotation mechanism to work, here's what I would do in the meantime:
untagged_locs_e = Location.objects.extra(
where=["""NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)"""]
)
This adds an additional WHERE clause to the SQL:
SELECT [snip] FROM `myapp_location`
WHERE NOT EXISTS(SELECT 1 FROM tagging_taggeditem ti
INNER JOIN django_content_type ct ON ti.content_type_id = ct.id
WHERE ct.model = 'location'
AND ti.object_id = myapp_location.id)
It joins to the django_content_type table to ensure that you're looking at the appropriate
content type for your model in the case where you have more than one taggable model type.
Change myapp_location.id to match your table name. There's probably a way to avoid hard-coding the table names, but you can figure that out if it's important to you.
Adjust accordingly if you're not using MySQL.

Try this:
[location for location in Location.objects.all() if location.tags.count() == 0]

Assuming your Location class uses the tagging.fields.TagField utility.
from tagging.fields import TagField
class Location(models.Model):
tags = TagField()
You can just do this:
Location.objects.filter(tags='')

Related

Get information from a model using unrelated field

I have these two models:
class A(models.Model):
name=models.CharField(max_length=10)
class D(models.Model):
code=models.IntegerField()
the code field can have a number that exists in model A but it cant be related due to other factors. But what I want know is to list items from A whose value is the same with code
items=D.objects.values('code__name')
would work but since they are not related nor can be related, how can I handle that?

You can use Subquery() expressions in Django 1.11 or newer.
from django.db.models import OuterRef, Subquery
code_subquery = A.objects.filter(id=OuterRef('code'))
qs = D.objects.annotate(code_name=Subquery(code_subquery.values('name')))
The output of qs is a queryset of objects D with an added field code_name.
Footnotes:
It is compiled to a very similar SQL (like the Bear Brown's solution with "extra" method, but without disadvantages of his solution, see there):
SELECT app_d.id, app_d.code,
(SELECT U0.name FROM app_a U0 WHERE U0.id = (app_d.code)) AS code_name
FROM app_d
If a dictionary output is required it can be converted by .values() finally. It can work like a left join i.e. if the pseudo related field allows null (code = models.IntegerField(none=True)) then the objects D are not restricted and the output code_name value could be None. A feature of Subquery is that it returns only one field expression must be eventually repeated for another fields. (That is similar to extra(select={...: "SELECT ..."}), but thanks to object syntax it can be more readable customized than an explicit SQL.)

you can use django extra, replace YOUAPP on your real app name
D.objects.extra(select={'a_name': 'select name from YOUAPP_a where id=code'}).values('a_name')
# Replace YOUAPP^^^^^

Django: ManyToMany filter matching on ALL items in a list

I have such a Book model:
class Book(models.Model):
authors = models.ManyToManyField(Author, ...)
...
In short:
I'd like to retrieve the books whose authors are strictly equal to a given set of authors. I'm not sure if there is a single query that does it, but any suggestions will be helpful.
In long:
Here is what I tried, (that failed to run getting an AttributeError)
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space,
# first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
final_books = QuerySet()
for author in target_authors:
temp_books = candidate_books.filter(authors__in=[author])
final_books = final_books and temp_books
... and here is what I got:
AttributeError: 'NoneType' object has no attribute '_meta'
In general, how should I query a model with the constraint that its ManyToMany field contains a set of given objects as in my case?
ps: I found some relevant SO questions but couldn't get a clear answer. Any good pointer will be helpful as well. Thanks.

Similar to #goliney's approach, I found a solution. However, I think the efficiency could be improved.
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space, first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
candidate_books = candidate_books.filter(authors=author)
final_books = candidate_books

You can use complex lookups with Q objects
from django.db.models import Q
...
target_authors = set((author_1, author_2))
q = Q()
for author in target_authors:
q &= Q(authors=author)
Books.objects.annotate(c=Count('authors')).filter(c=len(target_authors)).filter(q)

Q() & Q() is not equal to .filter().filter(). Their raw SQLs are different where by using Q with &, its SQL just add a condition like WHERE "book"."author" = "author_1" and "book"."author" = "author_2". it should return empty result.
The only solution is just by chaining filter to form a SQL with inner join on same table: ... ON ("author"."id" = "author_book"."author_id") INNER JOIN "author_book" T4 ON ("author"."id" = T4."author_id") WHERE ("author_book"."author_id" = "author_1" AND T4."author_id" = "author_1")

I came across the same problem and came to the same conclusion as iuysal,
untill i had to do a medium sized search (with 1000 records with 150 filters my request would time out).
In my particular case the search would result in no records since the chance that a single record will align with ALL 150 filters is very rare, you can get around the performance issues by verifying that there are records in the QuerySet before applying more filters to save time.
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
if candidate_books.count() > 0:
candidate_books = candidate_books.filter(authors=author)
For some reason Django applies filters to empty QuerySets.
But if optimization is to be applied correctly however, using a prepared QuerySet and correctly applied indexes are necessary.

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)

A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.

This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps

Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)

I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.

The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.

You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.

As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])

You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)

The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.

maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

Django equivalent of SQL REPLACE

Is there a Django ORM best practice for this SQL:
REPLACE app_model SET field_1 = 'some val', field_2 = 'some val';
Assumption: field_1 or field_2 would have a unique key on them (or in my case on both), otherwise this would always evaluate to an INSERT.
Edit:
My best personal answer right now is this, but it's 2-3 queries where 1 should be possible:
from django.core.exceptions import ValidationError
try:
Model(field_1='some val',field_2='some val').validate_unique()
Model(field_1='some val',field_2='some val',extra_field='some val').save()
except ValidationError:
Model.objects.filter(field_1='some val',field_2='some val').update(extra_field='some val')

You say you want REPLACE, which I believe is supposed to delete any existing rows before inserting, but your example indicates you want something more like UPSERT.
AFAIK, django doesn't support REPLACE (or sqlite's INSERT OR REPLACE, or UPSERT). But your code could be consolidated:
obj, created = Model.objects.get_or_create(field_1='some val', field_2='some_val')
obj.extra_field = 'some_val'
obj.save()
This of course assumes that either field_1, field_2, or both are unique (as you've said).
It's still two queries (a SELECT for get_or_create, and an INSERT or UPDATE for save), but until an UPSERT-like solution comes along (and it may not for a long, long time), it may be the best you can do.

Since Django 1.7 you can use update_or_create method:
obj, created = Person.objects.update_or_create(
first_name='John',
last_name='Lennon',
defaults={'profession': 'musician'},
)

I think the following is more efficient.
(obj, created) = Model.objects.get_or_create(
field_1 = 'some val',
field_2 = 'some_val',
defaults = {
'extra_field': 'some_val'
},
)
if not created and obj.extra_field != 'some_val':
obj.extra_field = 'some_val'
obj.save(
update_fields = [
'extra_field',
],
)
This will only update the extra_field if the row was not created and needs to be updated.

django: select_related with entry_set

Should entry_set be cached with select_related? My DB is still getting calls even after I use select_related. The pertinent sections
class Alias(models.Model):
achievements = models.ManyToManyField('Achievement', through='Achiever')
def points(self) :
points = 0
for a in self.achiever_set.all() :
points += a.achievement.points * a.count
return points
class Achievement(models.Model):
name = models.CharField(max_length=100)
points = models.IntegerField(default=1)
class Achiever(models.Model):
achievement = models.ForeignKey(Achievement)
alias = models.ForeignKey(Alias)
count = models.IntegerField(default=1)
aliases = Alias.objects.all().select_related()
for alias in aliases :
print "points : %s" % alias.points()
for a in alias.achiever_set.all()[:5] :
print "%s x %d" % (a.achievement.name, a.count)
And I'm seeing a big join query at the start, and then individual calls for each achievement. Both for the points and for the name lookup.
Is this a bug, or am I doing something wrong?

With Django 1.4 you can use prefetch_related which will work for ManyToMany relations:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related

Select_related() doesn't work with manytomanyfields. At the moment, this is something that is not planned, but might be a future feature. See http://code.djangoproject.com/ticket/6432
In this case, if you want to make a single query you got two options
1) Make your own SQL, probably won't be pretty or fast.
2) You could also query on the model with the foreignkey. You would be able to use select_related in that case. You stil won't be able to access the modelname_set but with some formatting you would be able to vet the data you need in a single query. None of the options are ideal, but you could get it working at a deacent speed aswell.

In Django 1.3 You can use Queryset.values() and do something like:
Alias.objects[.filter().exclude() etc.].values('achievements__name', 'achievement__points')
Only drwaback is that You get QuerySetList instead of QuerySet. But this can be simply overcome by passing all necessary fields into values() - You have to change Your perception ;)
This can save you few dosen of queries...
Details can be found here in django docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Retrieving untagged objects with django-tagging - django

Try this: [location for location in Location.objects.all() if location.tags.count() == 0]

Assuming your Location class uses the tagging.fields.TagField utility. from tagging.fields import TagField class Location(models.Model): tags = TagField() You can just do this: Location.objects.filter(tags='')

Related

Get information from a model using unrelated field

Django: ManyToMany filter matching on ALL items in a list

Annotating a Django queryset with a left outer join?

Django equivalent of SQL REPLACE

django: select_related with entry_set

Categories

Resources