Django Haystack RealTimeSearchIndex on ManyToMany Relationships strange behaviour - django

Following a related (as yet unanswered) question, I did some investigation and found that the current implementation of Django Haystack's RealTimeSearchIndex makes no attempt to also update on related field (Many to Many) changes. I thought this would be an easy fix - after all, I could just extend RealTimeSearchIndex like this:
class RealTimeM2MSearchIndex(RealTimeSearchIndex):
def _setup_save(self, model):
signals.m2m_changed.connect(self.update_object, sender=model)
signals.post_save.connect(self.update_object, sender=model)
But then I realized (or at least assumed, since it's not working) that this only works if the M2M field is defined on the model itself, and not if it's the "reverse" side of the M2M relationship. Trying to fix that, I then did something like the following:
signals.m2m_changed.connect(self.update_object, sender=model.related_field.through)
Where related_field is the name of the specific Model on other side of the ManyToMany definition. Strangely enough, upon running, Django then complains that the Model has no such field, related_field.
And sure enough, if I inspect the object, Django has not yet extended the model to have the related_field field. If I inspect the same object when displaying a view, however, it does have that related_field.
Summary
So the problem seems to be that Django's automatic behavior to add an attribute to the reverse side of an M2M relationship has yet to happen when Haystack runs its code. How can I overcome this obstacle, and allow Haystack's RealTimeSearchIndex to also update on related field changes?

I think the simplest solution is to just use the built in RealTimeSearchIndex, and add a signal listener in your models.py to reindex the model on m2m_changed, or whenever. See my answer to the other question - you could easily modify it to index on m2m_changed instead of post_save.

Just tried implementing this myself, and your problem is the value of the sender argument in this line:
signals.m2m_changed.connect(self.update_object, sender=model)
I read the documentation for the m2m_changed signal and the sender will be something like MyModel.my_field.through so you need to use that. This means you can't have a generic class as you are trying to do, but will need to define the _setup_save method in each case, with the m2m_changed signal connected for each ManyToMany field that model has.
For example, if your model had two ManyToManyFields called region and sector, you could do:
# we implement these to force the update when the ManyToMany fields change
def _setup_save(self, model):
signals.m2m_changed.connect(self.update_object,
sender=MyModel.sector.through)
signals.m2m_changed.connect(self.update_object,
sender=MyModel.region.through)
signals.post_save.connect(self.update_object, sender=model)
You should also really define the _teardown_save() method:
def _teardown_save(self, model):
signals.m2m_changed.disconnect(self.update_object,
sender=MyModel.sector.through)
signals.m2m_changed.disconnect(self.update_object,
sender=MyModel.region.through)
signals.post_save.disconnect(self.update_object, sender=model)
(This is based on code I have tested, it appears to work - certainly no errors about fields not existing).
Update: Just read your question more closely. Is it possible that your model has the ManyToManyField added dynamically? Is the call to register() after you have defined all your classes?

Related

Calculating the 'most recently used' of a django ManyToMany relation

I have a simple Tag model that many other models have a ManyToMany relationship to. The requirement has come up to be able to query/show the most recently used Tags in the system, across all entities that have Tags.
I can add a used_at attribute to the Tag model, and I could order on that. But obviously, the Tag model doesn't get modified when something else just references it, so an auto_now on that attribute won't help me.
Without the use of a through model (which could have an auto_now_add on it), and without performing any invisible (non-django) magic directly in the DB with triggers, is there a sensible way to update the Tag's timestamp whenever a model is saved that references it?
You could use m2m_changed signal
From docs:
Sent when a ManyToManyField is changed on a model instance. Strictly
speaking, this is not a model signal since it is sent by the
ManyToManyField, but since it complements the pre_save/post_save and
pre_delete/post_delete when it comes to tracking changes to models, it
is included here

Meaning of pk_set on m2m_changed

Django docs states:
pk_set
For the pre_add, post_add, pre_remove and post_remove actions, this is a set of primary key values that have been added to or removed from the relation.
For the pre_clear and post_clear actions, this is None.
The difference between the relation after being modified and the current state of the relation was exactly what I needed.
But when trying to use that on django admin I get the whole relation. I think if I wanted it all I would just use the instance parameter.
This looks like it's because clear is being called when I submit the form. So, it's basically wiping out every model and adding again(since keeping track of the model added/remove is a bit annoying). But that way this feature has no significance.
I want to listen for modifications on a manytomany relation and get only the ones that were added. And I'd like to do that on django admin.
If you provide a way to do that, please, also tell me how could that be done using through
Thanks!

Why does BaseModelForm update ALL fields, despite documentation saying it does not? Is this a bug?

I'm working on a django project with a legacy DB, using formset to edit a set of rows. There are fields in that DB that I don't want django to update, although I need them in my model. In other words, I want them to be treated as READ-ONLY fields.
Thus, I was happy to read the documentation on saving model formsets, which states:
"When fields are missing from the form (for example because they have
been excluded), these fields will not be set by the save() method. You
can find more information about this restriction, which also holds for
regular ModelForms, in Selecting the fields to use."
https://docs.djangoproject.com/en/1.8/topics/forms/modelforms/#saving-objects-in-the-formset
Indeed, when forms.model.BaseModelForm.save() is invoked, it calls forms.model.save_instance(), nicely passing in all the form fields. BUT than then calls instance.save() WITHOUT passing along the update_fields!! And so ALL the model fields are included in the update query :-(
As a test, I modified forms.model.save_instance() to pass the fields:
instance.save(update_fields=fields)
and voila - the model only saves fields listed by its form.
I'm hoping someone more involved in the django project can tell me if this a bug, or a documentation issue? Should I submit a bug report? Am I missing something - is there some other way I should be enforcing "read only" on those fields?
Using django1.8 and python3.4
I'm not sure why you think this behaviour contradicts that documentation, or why it would need to pass along the fields to update. The instance already has the unchanged data; Django will update the rest of the fields from the form, and save the whole thing.

How can one best mark a model field as required in Django?

In Django, I would like the ability to mark certain model fields as required at the model (or at least database) level, to make sure that I am specifying them explicitly (i.e. not relying on defaults) when creating objects.
Currently, Django lets you designate a model field as required at the forms level (by specifying blank=False in the model). However, it doesn't seem like there is a similar option to get this behavior at the model or database level.
It does seem like there are various hacks to achieve something similar though. For example, you can set the default to something that violates a database constraint. For example, you can do things like:
models.CharField(_('characters'), max_length=4, default=None)
or
models.CharField(_('characters'), max_length=4, default="abcdef")
The former example works when saving to the database since None violates the default not-null constraint of null=False (raising an IntegrityError). The latter works because a DataError is raised when saving. But I don't know if this is guaranteed to work across all databases, etc.
Am I missing something, or is there a better way?
If django models called full_clean() automatically on save(), your check would run at the model level without a form. I've been playing with making this the default behavior in my django projects by creating an auto-clean model subclass which does full_clean() on save(), then deriving my models off that.
If you want to learn why it isn't already like this: Why doesn't django's model.save() call full_clean()?

When to use form vs model validation?

Just curious. What is the best practice for when to use form vs model validation?
From what I understand currently, form validation should be used for:
AJAX / HTTP requests params
Forms that do not correlate to a model?
Another question is: I have a HTML form that roughly correlates to a model instance, do I use a ModelForm for it?
Definitely use ModelForm, if your form resembles model object even in a tiny bit.
If there are some minor differences (e.g. you don't use some of the fields or you want to use different error messages etc.) it's much easier to customize ModelForm then to use Form and implement all this functionality from scratch.
For more reference regarding ModelForm please checkout PyDanny's Core Concepts of Django ModelForms.
I am also trying to understand what is the difference/relation between form and model validation and I would like to share my notes that are formed after reading several docs.
I am currently interested in Creating Forms from Models
#mariodev shared the document Core Concepts of Django ModelForms and this provided a good start.
ModelForms select validators based off of Model field definitions
The main story behind the scenes seems to be the DRY principal. This article explains very well what exactly is the case here.
All right, all this is fair. The question is "Where in the Django Documentation is this explained"?
I bumped on a very brilliant article where it states that:
The form.full_clean() method is called by django at the start of the validation process (by form.is_valid(), usually immediately after the view receives the posted data).
Correct me if I am wrong but that line reads that everytime I enter data and hit 'enter' the validation process begins!
OK, this is simple now:
The validation on a ModelForm begins when we hit 'enter'.
Django first validates the form by checking one by one every applicable validation method on Fields, Field Subclasses (This is the documentation for a model's field subclass, not for a form field subclass), Form Subclasses and ModelForm (since it is a ModelForm).
Finally, it validates the Model Instance.
This is how all this works theoretically. The only thing that remains is to implement it.