Django: Automatically invalidate cache when data changes via Admin panel? - django

On a roll with Django questions today.
The caching framework looks pretty awesome and I'd like to use it sitewide. Rather than set an explicit expiry time for my views, I'd prefer to cache them indefinitely and only invalidate/delete the cache when the content changes. Dream scenario, right?
Is there some way to hook into Django's automatic admin so that when a CRUD operation happens, the relevant cache gets deleted? I expect I'd have to somehow tell the admin panel which model should invalidate which class, but in principle, is this possible? Some kind of callback I can add? Any alternatives?
thanks!
Matt

Two part answer:
Clear cache on a CRUD event? Easy as pie — use Django signals.
Clear only the relevant parts of the cache? This is a genuinely hard problem. On the surface it may look straightforward, but the dependencies can be very difficult to discern for all but the most trivial cases.
We sort of solved part 2 by extending the django caching code to embed object class/id info into the name, and then caching at a sub-page level. On a CRUD event we could do a simple regexp through the cached item names and prune as needed.
All in all, I think it was yet another case of Premature Optimization and it's not at all clear that it made any difference. Next time I'll wait until there is a proven, measurable performance problem before doing something like this.

Related

Django best practices to validate data in other tables -taking complexity from view file?

I was wondering about best practices in Django of validating the tables content
I am creating a Sales Orders and my SO should check availability of the items I have in stock and if they are not in stock it will trigger manufacturing orders and purchase orders.
I don't want to make very complex view and looking for a way to decouple logic from there and also I predict performance issues.
What are best practices or ready solutions I can use in Django framework to address view complexity ?
I see different possibilities but I am wondering what will be the best fit in my case :
managers
celery - just to run a job occasionally I want the app to be
real time so I don't like this option.
using signals /pre_save/post_sav
model validation
creating extra layer like services.py file
Since I am new to Django I am a bit puzzled what root to take.
Not sure if this is the answer you are looking for.
Signals are for doing things automatically when events happen. Most commonly used to do things before and after model operations. So if you need to do something every time you save a record or every time you create a new record or delete that is where you use signals.
Managers are used to manage record retrieval and manipulations. If you want to do some clever way of retrieving data you can define a custom manager and add some custom methods to it. If you want to override some default behaviors of querysets you would also do it with a custom manager.
Celery is for running things asynchronously. If you are worried that some processing you are doing might take a long time that is were you might consider offloading things to celery. A friendly warning though, doing things asynchronously raises complexity of your code quite a bit, since you need to add some mechanism to pass the data back from celery tasks into your django app and your users.
services.py link that you posted seems to do what you want, it just provides a place where you can put logic that is not specific to a particular view.
Here on stackoverflow, i got an advice from some experienced developers that premature optimization is the root of all evil.
What i suggest is keep it simple. Making the view a little more complex is actually better than effectively adding one more layer of complexity. I would suggest that you try to put most of you logic in models and whatever remains after that in views.
Also, unnecessarily using multiple packages would not solve much of your problem so use the when its necessary. Otherwise try to write the minimal logic yourself so that you donot have to use many apps.
Signals and other things as everybody say is not a great thing however promising it may seem. Just try to make things simpler.
One more point from my side as you are just starting out, go through class based views and try to use them when you get familiar. That will simplify your views the most. Plus, if ou are new to django, read a little code. https://github.com/vitorfs/bootcamp might help you in initiation.

Django cache everything but a piece

I'm writing a blog application. All the pages (lists of posts, detail of the post) are really static, I can predict when the must be update (for example when I write a new post or a comment is added). I could use #cache_page to cache entire views.
The only problem is that in every page I have some data collected from Twitter that I want to update every 5 minutes.
Django offers template caching, per-view caching and the low level cache framework. With the low level framework I can avoid calculating most of what must be displayed on the page (like caching Post queries, comments, tags...).
What is the best approach to my problem? How to aggressively cache almost everything for a view / template but a few parts?
I want to avoid using iframes.
Thanks
You can not exclude certain parts of a Django template for the cache not should this work in any other template engine I know of.
My advice would be to use JavaScript to asynchronously load you're ever changing content. It should be particularly easy with Twitter as the already offer a great API.
It that doesn't suit you, you can always use Django template caching, to cache only parts of your template.
One option might be to set up Varnish on the server. I'm not familiar with Varnish myself, but as I understand it you can use Edge Side Includes to cache only certain fragments of a page.
Obviously it may not suit your use case, but it sounds like a possibility.

Optimisation tips when migrating data into Sitecore CMS

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

Django: cache unless output has changed?

This is a newbie question from someone who doesn't know much about HTTP caching :)
I'm using Django with the #never_cache decorator.
Is there a way I can instruct the browser to cache the page unless the content has changed, in which case the browser should reload the page?
Thanks.
I disagree with Dominic and there is a very good reason to generate the page, see if it's changed and throw it away if it hasn't - and that's to avoid the need to transfer the entire page over the internet. This only makes sense if your page is quite cheap to generate and is fairly large, but it can be a quick win.
The mechanism for doing this is the ETag header. Django has good support for this, just set USE_ETAGS in settings.py and you'll get the benefit of returning 304 Not Modified responses where appropriate on all your pages.
I think reading this would be a good starting point:
https://web.archive.org/web/20180101014856/http://eflorenzano.com/blog/2008/11/29/drop-dead-simple-django-caching/
An excerpt:
Caching is easy to screw up. Usually it's a manual process which is error-prone and tedious. It's actually quite easy to cache, but knowing when to invalidate which caches becomes a lot harder. [...] The underlying idea is that every Django model has a primary key, which makes for an excellent key to a cache. Using this basic idea, we can cover a fairly large use case for caching, automatically, in a much more deterministic way.

my Django development (needs advice)

I am writing a website using Django. I need to push the web site out as soon as possible. I don't need a lot of amazing things right now.
I am concern about the future development.
If I enable registration, which means I allow more contents to be writable. If I don't, then only the admins can publish the content. The website isn't exactly a CMS.
This is a big problem, as I will continue to add new features and rewriting codes (either by adapting third-party apps, or rewrites the app itself). So how would either path affects my database contents?
So the bottom line is, how do I ensure as the development continues, I can ensure the safety of my data?
I hope someone can offer a little insights on this matter.
Thank you very much. It's hard to describe my concern, really.
Whatever functionalities you will add after, if you add new fields, etc ... you can still migrate your data to the "new" database.
It becomes more complicated with relationships, because you might have integrity problems. Say you have a Comment model, and say you don't enable registration, so all users can comment on certain posts. If after, you decide to enable registration, and you decide that ALL the comments have to be associated with a user, then you will have problems migrating your data, because you'll have lots of comments for which you'll have to make up a user, or that you'll just have to drop. Of course, in that case there would be work-arounds, but it is just to illustrate some of the problems you might encounter later.
Personally, I try to have a good data-model, with only the minimum necessary fields (more fields will come after, with new functionalities). I especially try to avoid having to add new foreign keys in already existing models. For example, it is fine to add a new model later, with a foreign key to existing model, but the opposite is more complicated.
Finally, I am not sure about why you hesitate to enable registration. It is actually very very simple to do (you can for example use django-registration, and you would just have to write some urlconf, and some templates, and that's all ...)
Hope this helps !
if you are afraid of data migration, just use south...