django distinct query using custom equivalence

django distinct query using custom equivalence - django

Say that my model looks like this:
class Alert(models.Model):
datetime_alert = models.DateTimeField()
alert_type = models.ForeignKey(Alert_Type, on_delete=models.CASCADE)
dismissed = models.BooleanField(default=False)
datetime_dismissed = models.DateTimeField(null=True)
auid = models.CharField(max_length=64, unique=True)
entities = models.ManyToManyField(to='Entity', through='Entity_To_Alert_Map')
objects = Alert_Manager()
def __eq__(self, other):
return isinstance(other,
self.__class__) and self.alert_type == other.alert_type and \
self.entities.all() == other.entities().all() and self.dismissed == other.dismissed
def __ne__(self, other):
return not self.__eq(other)
what I'm trying to accomplish is say this: two alert objects are equivalent if the dismissed status, alert type, and the associated entities are the same. Using this idea, is it possible to write a query to ask for all the distinct alerts based off that criteria? Selecting all of them and then filtering them out doesn't seem appealing.

You mention one method to do it, and I don't think it is very bad. I'm not aware of anything in Django that can do this.
However, I want you to think why this problem arises? If two alerts are equal if message, status and type is the same, then maybe this should be it's own class. I would consider creating another class DistinctAlert (or some better name) and have a foreign key to this class from Alert. Or even better, have one class that is Alert, and one that is called AlertEvent(your Alert class).
Would this solve your problem?
Edit:
Actually, there is a way to do this. You can combine values() and distinct(). This way, your query will be
Alert.objects.all().values("alert_type", "dismissed", "entities").distinct()
This will return a dictionary.
See more in the documentation of values()

Related

Customizing the entry uniqueness in Django

I have a database containing a list of ingredients. I'd like to avoid duplicate entries in this table. I don't want to use the unique keyword for 2 reasons :
My uniqueness constraints are a bit more sophisticated than a mere =
I don't want to raise an exception when a pre-existing ingredient model is created, instead I just want to return that model, so that I can write Ingredient(ingredient_name='tomato') and just go on with my day rather than encapsulating all of that in a try clause. This will allow me to easily add ingredients to my recipe table on the fly.
One solution is simply to have a wrapper function like create_ingredient, but I don't find that to be particularly elegant and more specifically it's not robust to some other developer down the line simply forgetting to use the wrapper. So instead, I'm playing around with the pre_init and post_init signals.
Here's what I have so far :
class Ingredient(models.Model):
ingredient_name = models.CharField(max_length=200)
recipes = models.ManyToManyField(Recipe,related_name='ingredients')
def __str__(self):
return self.ingredient_name
class Name(models.Model):
main_name = models.CharField(max_length=200, default=None)
equivalent_name = models.CharField(max_length=200, primary_key=True, default=None)
def _add_ingredient(sender, args, **kwargs):
if 'ingredient_name' not in kwargs['kwargs'] :
return
kwargs['kwargs']['ingredient_name'] = kwargs['kwargs']['ingredient_name'].lower()
# check if equivalent name exists, make this one the main one otherwise
try:
kwargs['kwargs']['ingredient_name'] = Name.objects.filter(
equivalent_name=kwargs['kwargs']['ingredient_name']
)[0].main_name
except IndexError:
name = Name(main_name=kwargs['kwargs']['ingredient_name'],
equivalent_name=kwargs['kwargs']['ingredient_name'])
name.save()
pre_init.connect(_add_ingredient, Ingredient)
So far so good. This actually works and will replace ingredient_name when needed before the model is initialized. Now what I'd like is to check if the ingredient in question already exists and have the initializer return it if it does. I think I need to play around with post_init to do this but I don't know how to modify the particular instance that's being created. Here's what I mean by that :
def _finalize_ingredient(sender, instance, **kwargs):
try:
# doesn't work because of python's "pass arguments in python's super unique way of doing things" thing
instance = Ingredient.objects.filter(ingredient_name=instance.ingredient_name)[0]
except IndexError:
pass
post_init.connect(_finalize_ingredient, Ingredient)
As I've commented, I don't expect this to work because instance = ... doesn't actually modify instance, it just reassigns the variable name (incidentally if you try to run this all sorts of terrible things happen which I don't care to understand because I know this is flat out wrong). So how do I actually do this ? I really hope wrapper functions aren't the cleanest option here. I'm a big fan of OOP and gosh darn it I want an OOP solution to this (which, as I've said, I think in the long run would be much more robust and safer than wrappers).
I realize of course that I can add an add_ingredient method to Recipe which will do all of this for me, but I really like the idea of containing all of this in my Ingredient class as it will guarantee the proper database behavior under any circumstance. I'm also curious as to know if/how the post_init method can be used to completely override the created object for a given circumstance.
By the way, some of you may be wondering why I don't have a ForeignKey entry in my Name class that would connect the Name table to the Ingredient table. After all, isn't this what my check is essentially accomplishing in my _add_ingredient method ? One of the reasons is that if I do this then I end up with the same problem I'm trying to solve here : If I want to create an ingredient on the fly to add it to my recipe, I could simply create a Name object when creating an Ingredient object, but that would raise an exception if it corresponds to a main_name that is already in use (rather than simply returning the object I need).

I believe you are looking for get_or_create(), which is already a built-in in Django.
You mention:
One solution is simply to have a wrapper function like create_ingredient, but I don't find that to be particularly elegant and more specifically it's not robust to some other developer down the line simply forgetting to use the wrapper.
Well, look at it the other way around. What if you actually need to create a "duplicate" ingredient? Then it is nice to have the possibility.

I've come up with something that is as elegant and robust as I think it's possible to be given what I'm after. I've still had to define an add_ingredient method, but I still have the robustness that I need. I've made it so that it can be generalized to any class with a primary key, and the Name table will contain the info that will define the name uniqueness of any table :
class Name(models.Model):
main_name = models.CharField(max_length=200, default=None)
equivalent_name = models.CharField(max_length=200, primary_key=True, default=None)
def _pre_init_unique_fetcher(sender, args, **kwargs):
pk_name = sender._meta.pk.name
if pk_name not in kwargs['kwargs'] :
return
kwargs['kwargs'][pk_name] = kwargs['kwargs'][pk_name].lower()
# check if equivalent name exists, make this one the main one otherwise
try:
kwargs['kwargs'][pk_name] = Name.objects.filter(
equivalent_name=kwargs['kwargs'][pk_name]
)[0].main_name
except IndexError:
name = Name(main_name=kwargs['kwargs'][pk_name],
equivalent_name=kwargs['kwargs'][pk_name])
name.save()
sender._input_dict = kwargs['kwargs']
def _post_init_unique_fetcher(sender, instance, **kwargs):
pk_name = sender._meta.pk.name
pk_instance = instance.__dict__[pk_name]
filter_dict = {}
filter_dict[pk_name] = pk_instance
try:
post_init.disconnect(_post_init_unique_fetcher,sender)
instance.__dict__ = sender.objects.filter(**filter_dict)[0].__dict__
post_init.connect(_post_init_unique_fetcher, sender)
for key in sender._input_dict:
instance.__dict__[key] = sender._input_dict[key]
del sender._input_dict
except IndexError:
post_init.connect(_post_init_unique_fetcher, sender)
except:
post_init.connect(_post_init_unique_fetcher, sender)
raise
unique_fetch_models = [Ingredient, Recipe, WeekPlan]
for unique_fetch_model in unique_fetch_models :
pre_init.connect(_pre_init_unique_fetcher, unique_fetch_model)
post_init.connect(_post_init_unique_fetcher, unique_fetch_model)
Now what this will do is load up any new model with the pre-existing data of the previous model (rather than the default values) if one with the same name exists. The reason I still need an add_ingredient method in my Recipe class is because I can't call Ingredient.objects.create() for a pre-existing ingredient without raising an exception despite the fact that I can create the model and immediately save it. This has to do with how Django handles the primary_key designation : if you create the model then save it, it assumes you're just updating the entry if it already exists with that key, and yet if you create it, it tries to add another entry and that conflicts with the primary_key designation. So now I can do things like recipe.add_ingredient(Ingredient(ingredient_name='tomato', vegetarian=True)).

Django QuerySets - with a class method

Below is a stripped down model and associated method. I am looking for a simple way upon executing a query to get all of the needed information in a single answer without having to re-query everything. The challenge here is the value is dependent upon the signedness of value_id.
class Property(models.Model):
property_definition = models.ForeignKey(PropertyDefinition)
owner = models.IntegerField()
value_id = models.IntegerField()
def get_value(self):
if self.value_id < 0: return PropertyLong.objects.get(id=-self.value_id)
else: return PropertyShort.objects.get(id=self.value_id)
Right now to get the "value" I need to do this:
object = Property.objects.get(property_definition__name="foo")
print object.get_value()
Can someone provide a cleaner way to solve this or is it "good" enough? Ideally I would like to simply just do this.
object = Property.objects.get(property_definition__name="foo")
object.value
Thanks

Given this is a bad design. You can use the builtin property decorator for your method to make it act as a property.
class Property(models.Model):
property_definition = models.ForeignKey(PropertyDefinition)
owner = models.IntegerField()
value_id = models.IntegerField()
#property
def value(self):
if self.value_id < 0: return PropertyLong.objects.get(id=-self.value_id)
else: return PropertyShort.objects.get(id=self.value_id)
This would enable you to do what you'd ideally like to do: Property.objects.get(pk=1).value
But I would go as far as to call this "cleaner". ;-)
You could go further and write your own custom model field by extending django.models.Field to hide the nastiness in your schema behind an API. This would at least give you the API you want now, so you can migrate the nastiness out later.
That or the Generic Keys mentioned by others. Choose your poison...

this is a bad design. as Daniel Roseman said, take a look at generic foreign keys if you must reference two different models from the same field.
https://docs.djangoproject.com/en/1.3/ref/contrib/contenttypes/#generic-relations

Model inheritance could be used since value is not a Field instance.

Django Query (aggregates and counts)

Hey guys, I've got a model that looks like this:
class Interaction(DateAwareModel, UserAwareModel):
page = models.ForeignKey(Page)
container = models.ForeignKey(Container, blank=True, null=True)
content = models.ForeignKey(Content)
interaction_node = models.ForeignKey(InteractionNode)
kind = models.CharField(max_length=3, choices=INTERACTION_TYPES)
I want to be able to do one query to get the count of the interactions grouped by container then by kind. The idea being that the output JSON data structure (serialization taken care of by piston) would look like this:
"data": {
"container 1": {
"tag_count": 3,
"com_count": 1
},
"container 2": {
"tag_count": 7,
"com_count": 12
},
...
}
The SQL would look like this:
SELECT container_id, kind, count(*) FROM rb_interaction GROUP BY container_id, kind;
Any ideas on how to group by multiple fields using the ORM? (I don't want to write raw queries for this project if I can avoid id) This seems like a simple and common query.
Before you ask: I have seen the django aggregates documentation and the raw queries documentation.
Update
As per advice below I've created a custom manager to handle this:
class ContainerManager(models.Manager):
def get_query_set(self, *args, **kwargs):
qs = super(ContainerManager, self).get_query_set(*args, **kwargs)
qs.filter(Q(interaction__kind='tag') | Q(interaction__kind='com')).distinct()
annotations = {
'tag_count':models.Count('interaction__kind'),
'com_count':models.Count('interaction__kind')
}
return qs.annotate(**annotations)
This only counts the interactions that are of kind tag or com instead of retrieving the counts of tags and of the coms via group by. It is obvious that it works that way from the code but wondering how to fix it...

Create a custom manager:
class ContainerManager(models.Manager):
def get_query_set(self, *args, **kwargs):
qs = super(ContainerManager, self).get_query_set(*args, **kwargs)
annotations = {'tag_count':models.Count('tag'), 'com_count':models.Count('com')}
return qs.annotate(**annotations)
class Container(models.Model):
...
objects = ContainerManager()
Then, Container queries will always include tag_count and com_count attributes. You'll probably need to modify the annotations, since I don't have a copy of your model to refer to; I just guessed on the field names.
UPDATE:
So after gaining a better understanding of your models, annotations won't work for what you're looking for. Really the only to get counts for how many Containers have kinds of 'tag' or 'com' is:
tag_count = Container.objects.filter(kind='tag').count()
com_count = Container.objects.filter(kind='com').count()
Annotations won't give you that information. I think it's possible to write your own aggregates and annotations, so that might be a possible solution. However, I've never done that myself, so I can't really give you any guidance there. You're probably stuck with using straight SQL.

Custom properties in a query

Given the simplified example below, how would I access my custom "current_status" property within a queryset? Is it even possible?
At the moment, I want to list the all the current Events and show the current status. I can get the property to display in a template ok, but I can't order the queryset by it. Alternatively, would I need to create a custom manager with some kind of nested "if" statement in the 'Select'?
class Event(models.Model):
....
date_registered = models.DateField(null=True, blank=True)
date_accepted = models.DateField(null=True, blank=True)
date_reported = models.DateField(null=True, blank=True)
...
def _get_current_status(self):
...
if self.date_reported:
return "Reported"
if self.date_accepted:
return "Accepted"
if self.date_registered:
return "Registered"
if self.date_drafted:
return "Drafted"
current_status = property(_get_current_status)

Instead of calculating the status as a property, create a proper model field for it and update it in the save method. Then you can use that field directly in the query.

You cannot use a custom property in query, since Django's ORM will try to map it to a database column and fail. Of course you can use it in an evaluated queryset, e.g. when you're iterating about the objects of a query's results!
You can only filter for things like: Event.objects.filter(date_drafted__isnull=False).
http://docs.djangoproject.com/en/dev/ref/models/querysets/#isnull

Thanks to Daniel. I think that I might use your approach. However, I also managed to get it working using the queryset 'extra' method, which might also be useful to other people, although its probably isn't database agnostic.
qs = Event.objects.extra(select={'current_status_id':
'''(CASE
WHEN date_cancelled THEN 0
WHEN date_closed THEN 6
WHEN date_signed_off THEN 5
WHEN date_reported THEN 4
WHEN date_accepted THEN 3
WHEN date_registered THEN 2
WHEN date_drafted THEN 1
ELSE 99
END)
'''})

Annotate over Multi-table Inheritance in Django

I have a base LoggedEvent model and a number of subclass models like follows:
class LoggedEvent(models.Model):
user = models.ForeignKey(User, blank=True, null=True)
timestamp = models.DateTimeField(auto_now_add=True)
class AuthEvent(LoggedEvent):
good = models.BooleanField()
username = models.CharField(max_length=12)
class LDAPSearchEvent(LoggedEvent):
type = models.CharField(max_length=12)
query = models.CharField(max_length=24)
class PRISearchEvent(LoggedEvent):
type = models.CharField(max_length=12)
query = models.CharField(max_length=24)
Users generate these events as they do the related actions. I am attempting to generate a usage-report of how many of each event-type each user has caused in the last month. I am struggling with Django's ORM and while I am close I am running into a problem. Here is the query code:
def usage(request):
# Calculate date range
today = datetime.date.today()
month_start = datetime.date(year=today.year, month=today.month - 1, day=1)
month_end = datetime.date(year=today.year, month=today.month, day=1) - datetime.timedelta(days=1)
# Search for how many LDAP events were generated per user, last month
baseusage = User.objects.filter(loggedevent__timestamp__gte=month_start, loggedevent__timestamp__lte=month_end)
ldapusage = baseusage.exclude(loggedevent__ldapsearchevent__id__lt=1).annotate(count=Count('loggedevent__pk'))
authusage = baseusage.exclude(loggedevent__authevent__id__lt=1).annotate(count=Count('loggedevent__pk'))
return render_to_response('usage.html', {
'ldapusage' : ldapusage,
'authusage' : authusage,
}, context_instance=RequestContext(request))
Both ldapusage and authusage are both a list of users, each user annotated with a .count attribute which is supposed to represent how many particular events that user generated. However in both lists, the .count attributes are the same value. Infact the annotated 'count' is equal to how many events that user generated, regardless of type. So it would seem that my specific
authusage = baseusage.exclude(loggedevent__authevent__id__lt=1)
isn't excluding by subclass. I have tried id__lt=1, id__isnull=True, and others. Halp.

The key to Django model inheritance is remembering that with a non-abstract base class everything is really an instance of the base class which might happen to have some extra data strapped on the side from a separate table. This means that when you do searches on the base table you get back instances of the base class and there's no way to tell which subclass it is without doing repeated database queries on the subclass tables to see if they contain a record with a matching key ("I have an event. Does it have a record in AuthEvent? No. What about LDAP Event?…"). Among other things this means that you can't easily filter on them in normal queries on the base class without doing a join on every subclass table.
You have a couple of choices: one would simply be to do your queries on the subclass and tally the results (ldap_event_count = LDAPEvent.objects.filter(user=foo).count(), …), which might be sufficient for a single report. I usually recommend adding a content type field to the base class so you can efficiently tell which particular subclass an instance is without having to do another query:
content_type = models.ForeignKey("contenttypes.ContentType")
That allows two major improvements: the most common one is that you can deal with many Events generically without having to do something like hit the subclass-specific accessors (e.g. event.authevent or event.ldapevent) and handling DoesNotExist. In this case it would also make it trivial to rewrite your query since you could just do something like Event.objects.aggregate(Count("content_type")) to get the report values, which becomes particularly handy if your logic gets more complicated ("Event is Auth or LDAP and …").

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

django distinct query using custom equivalence - django

Related

Customizing the entry uniqueness in Django

Django QuerySets - with a class method

Django Query (aggregates and counts)

Custom properties in a query

Annotate over Multi-table Inheritance in Django

Categories

Resources