How to filter django-taggit top tags - django

Suppose you have a database with User objects running behind a Djano app
and you want to use django-taggit to tag User objects so you can retrieve subgroups using some convenient filtering.
Additionally you have a Dashboard where you want to display interesting statistics about used tags to glean some information about the subgroups that exists within your Users.
How would you access and display information about the top X tags
used within the Django app?
How would you access only the top X tags of an already filtered
subgroup of the User object?

While there are already a number of posts on SO that describe similar problems most of these describe workarounds or contain scattered information.
In order to make this information easier to find I will post a simple rundown of how to achieve some basic stuff using features of django-taggit that are officially supported but are not present in the official documentation.
How would you access and display information about the top X tags
used within the Django app?
In order to access and display information about the top tags used within the Django app you can use the built in function most_common like so:
top_tags = User.tag.most_common()
This returns a queryset containing all of the tags placed on a User instance ordered from most used descending order.
So say we have 3 tags: ["vegetables", "fruits", "candy"] and 10 users have a fruits tag, 4 users have a vegetables tag and only 1 user has the candy tag the returned order would be: ["fruits", "vegetables", "candy"]
Accessing more information about the tags returned can be done like so:
for tag in top_tags:
print(tag.name) #the name of the tag
print(tag.num_times) # the number of User objects tagged
Additionally if you are only interested in the top 3 tags then you can
access them like this:
top_tags = User.tag.most_common()[:3]
Where you can replace 3 with X where X is the number of items you want returned.
How would you access only the top X tags of an already filtered
subgroup of the User object?
Since Jul 12, 2016 the most_common() function actually has some additional arguments that you can specify. First of all you can specify a min_count which filters out the top tags that fall below a certain threshold. As an illustration using the tags from the previous example:
top_tags = User.tag.most_common()[:3]
returns all three tags as specified earlier but using
top_tags = User.tag.most_common(min_count=2)[:3]
only returns ["fruits", "vegetables"] this is because only 1 User object was tagged with candy meaning that it falls below the min_count of 2
An additional argument that you can provide to most_common is extra_filters this enables you to provide an object containing additional filter values that you want to filter the tags by.
One usage example would be:
filtered_users = User.objects.filter(age=20, is_delete=False)
top_tags = User.tag.most_common(
min_count=1, extra_filters={
'user__in': filtered_users
}
)
Here we create a filtered queryset of User objects that we then provide to the extra_filters argument to limit the tag search to a specific subgroup

Related

How to populate table based on the condition of another table's column value in Django

I am kinda new in django and I am stuck.
I am building and app to store equipment registry. I have done a model for equipment list and i have a status values as "available", "booked", "maintenance"
I also have a model for all equipment that are not available. not in my html in the "not available registry" i want to show only details of equipment in the list that are marked as "booked" and "maintenance"
There are three ways of doing this.
To filter out objects of the model that are marked as "booked" or "maintenance" you can use complex lookups with Q objects. It allows you to filter objects with OR statement. Here you need to find objects that have status set to "booked" OR to "maintenance". The query should look like this:
from django.db.models import Q
Equipment.objects.filter(Q(status='booked') | Q(status='maintetnance'))
Second way of doing this is by using __in statement to filter objects that you need:
not_available_status = ['booked', 'maintenance']
Equipment.objects.filter(status__in=not_available_status)
And final way is to exclude objects that you don't need:
Equipment.objects.exclude(status='available')

Sharepoint2013 list item filter

My requirement is I have to submit records into a list/lib with attachments and that record can be tag to category field they can be multiple for a single item.
i.e. An item A can be tag to tag to category X or can be cat X,Y(multiple category can be)
My requirement it user can also filter these record in list/lib on the basis of category tagged.
i.e. if an item A is tagged with cat X,Y it should be show in both cat when we filter.
What approach i should i use in Sharepoint 2013?
you can use the taxonomy to achieve this behaviour. Generate Term sets and terms in it and use them to tag your List Item.
You didnt provide the details about how you are searching the data? by your code or the OOTB sharepoint search ?
Thanks

Searching a many to many database using Google Cloud Datastore

I am quite new to google app engine. I know google datastore is not sql, but I am trying to get many to many relationship behaviour in it. As you can see below, I have Gif entities and Tag entities. I want my application to search Gif entities by related tag. Here is what I have done;
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
class Tag(ndb.Model):
name = ndb.StringProperty()
class TagGifPair(ndb.Model):
tag_id = ndb.IntegerProperty()
gif_id = ndb.IntegerProperty()
#classmethod
def search_gif_by_tag(cls, tag_name)
query = cls.query(name=tag_name)
# I am stuck here ...
Is this a correct start to do this? If so, how can I finish it. If not, how to do it?
You can use repeated properties https://developers.google.com/appengine/docs/python/ndb/properties#repeated the sample in the link uses tags with entity as sample but for your exact use case will be like:
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
# you store array of tag keys here you can also just make this
# StringProperty(repeated=True)
tag = ndb.KeyProperty(repeated=True)
#classmethod
def get_by_tag(cls, tag_name):
# a query to a repeated property works the same as if it was a single value
return cls.query(cls.tag == ndb.Key(Tag, tag_name)).fetch()
# we will put the tag_name as its key.id()
# you only really need this if you wanna keep records of your tags
# you can simply keep the tags as string too
class Tag(ndb.Model):
gif_count = ndb.IntegerProperty(indexed=False)
Maybe you want to use list? I would do something like this if you only need to search gif by tags. I'm using db since I'm not familiar with ndb.
class Gif(db.Model):
author = db.UserProperty()
link = db.StringProperty(indexed=False)
tags = db.StringListProperty(indexed=True)
Query like this
Gif.all().filter('tags =', tag).fetch(1000)
There's different ways of doing many-to-many relationships. Using ListProperties is one way. The limitation to keep in mind if using ListProperties is that there's a limit to the number of indexes per entity, and a limit to the total entity size. This means that there's a limit to the number of entities in the list (depending on whether you hit the index count or entity size first). See the bottom of this page: https://developers.google.com/appengine/docs/python/datastore/overview
If you believe the number of references will work within this limit, this is a good way to go. Considering that you're not going to have thousands of admins for a Page, this is probably the right way.
The other way is to have an intermediate entity that has reference properties to both sides of your many-to-many. This method will let you scale much higher, but because of all the extra entity writes and reads, this is much more expensive.

How do I use django's Q with django taggit?

I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object

Generating a single queryset with filtered summary data across a foreign key?

I have a small Django project to learn with (it's a web UI for the RANCID backup software) and I've run into a problem.
The model for the app defines Devices, and DeviceGroups. Each Device is a member of a group and has a couple of state flags - Enabled, Successful - to indicate if they are operating correctly. Here's the relevant bits.
class DeviceGroup(models.Model):
group_name = models.CharField(max_length=60,unique=True)
class Device(models.Model):
hostname = models.CharField(max_length=60,unique=True)
enabled = models.BooleanField(default=True)
device_group = models.ForeignKey(DeviceGroup)
last_was_success = models.BooleanField(default=False,editable=False)
I have a summary table on the front 'dashboard' page, that shows a list of all the groups, and for each group, how many devices are in it. I'd like to also show the number of Active devices, and the number of failing (i.e. Not last_was_success) devices per-group. The plain device count is already available through the ForeignKey field.
This seems like the kind of thing that annotate is for, but not quite. And actually, I'm not sure how I'd do it with raw SQL either. Most likely as three queries and some lookup afterwards, or subqueries.
So - is it possible 'nicely' in Django? Or alternatively, how do you do the joining up again in the Template or View? The object passed into the template is simply:
device_groups = DeviceGroup.objects.order_by('group_name')
currently, and I don't think I can just add extra fields onto the queryset results "manually", can I? i.e. it's not a dict or similar.
i think you must use
device_groups = DeviceGroup.objects.all().order_by('group_name')
or
device_groups = DeviceGroup.objects.filter(condition).order_by('group_name')