I have a query like this:
profiles = UserProfile.objects.all().order_by('-created')
There are several thousands of userprofiles hence the query is costly. so I am wondering how to cache the result for 5 minutes.
I have read this doc but could not find my answer.
You should have a look here and here instead.
In essence, django's querysets are lazy and their maintain a cache, meaning they only call the db when evaluated for the first time. This means that your profile variable will only query the db once during it's lifetime (unless you do profile.filter() or something similar to alter the query in which case it will have to be reevaluated).
Now the important part in regards to your question is the lifetime of the variable. It will go out of scope as soon as the method it is defined in returns, in other words it will be created anew, querying the db, for each new request. In order to avoid that you can either cache the results using memcached or a similar system OR by making your profile variable global which will persist between requests.
Related
I know how to write Actions that provide intermediate pages, since the docs are great:
https://docs.djangoproject.com/en/2.0/ref/contrib/admin/actions/#actions-that-provide-intermediate-pages
But, if my selection contains 100k rows, the pattern of the docs does not work since the URL gets too long.
How to write Django Admin Actions that provide intermediate pages and can handle +100k rows?
I solved it this way:
Pickle QuerySets
Store pickled QuerySet in the cache under a random ID
forward the random ID to the next page
the next pages use the random ID to read the QuerySet from the cache.
When i need something closer to that i used some grouping variables like: all, active, accepted, denied. By doing this grouping i can do some bulk action on huge large of data without creating a python list with thousands of pks.
Another good point to pay atention is that you need to pass that to the DB, otherwise you will have a enormous bottleneck on the views/models.
I am looking to cache some rarely updated data coming from several related tables in DB (MySQL to be specific). I have tried django ORM queryset caching with the above django apps but have stumbled into some weird behavior with both of them: some of the queries are queried even when they're supposed to be coming from cache like the rest of them.
I am retrieving the data in a set of 6 queries, mixing any of: filter, select_related, prefetch_related (M2M relations), forward and reverse relations. Just data retrieval, no updates. This happens in a dedicated method, which for the purposes of the evaluation, I'm calling twice from a TestCase. (To preempt any potential issues there was no transactions management of any kind in the evaluated method)
What I'm observing is that 3 of the queries were repeated - one a reverse m2m from prefetch_related (while another reverse m2m was cached), and two rather simple reverse manager queries, that maintained that behavior even when changed to ModelName.objects.all() or a filter corresponding to the relation.
These last trials that I made looking for some pattern were made only on johnny-cache, which is preferable for my particular scenario because of its invalidation policies. I also tried to modify the method in some general ways, basically shooting in the dark, like remove the offending queries (everything's cached), leave just one of the offending queries(still not cached), changing query order(no change).
Has anybody run into something similar or can offer some explanation?
On my website I'm going to provide points for some activities, similarly to stackoverflow. I would like to calculate value basing on many factors so each computation for each user will take for instance 10 SQL queries.
I was thinking about caching it:
in memcache,
in user's row in database (so that wherever I need to get user from base I easly show the points)
Storing in database seems easy but on other hand it's redundant information and I decided to ask, since maybe there is easier and prettier solution which I missed.
I'd highly recommend this app for storing the calculated values in the model: https://github.com/initcrash/django-denorm
Memcache is faster than the db... but if you already have to retrieve the record from the db anyway, having the calculated values cached in the rows you're retrieving (as a 'denormalised' field) is even faster, plus it's persistent.
I think I've read somewhere that Django's ORM lazily loads objects. Let's say I want to update a large set of objects (say 500,000) in a batch-update operation. Would it be possible to simply iterate over a very large QuerySet, loading, updating and saving objects as I go?
Similarly if I wanted to allow a paginated view of all of these thousands of objects, could I use the built in pagination facility or would I manually have to run a window over the data-set with a query each time because of the size of the QuerySet of all objects?
If you evaluate a 500000-result queryset, which is big, it will get cached in memory. Instead, you can use the iterator() method on your queryset, which will return results as requested, without the huge memory consumption.
Also, use update() and F() objects in order to do simple batch-updates in single query.
If the batch update is possible using a SQL query, then i think using sql-queries or django-orm will not make a major difference. But if the update actually requires loading each object, processing the data and then updating them, you can use the orm or write your own sql query and run update queries on each of the processed data, the overheads completely depends on the code logic.
The built-in pagination facility runs a limit,offset query (if you are doing it correct), so i don't think there are major overheads in the pagination either ..
As I benchmarked this for my current project with dataset of 2.5M records in one table.
I was reading information and counting records, for example, I needed to find IDs of records, which field "name" was updated more than once in certain timeframe. Django benchmark was using ORM, to retrieve all records and then to iterate through them. Data was saved in list for future processing. No any debug output, except result print in the end.
On the other end, I was using MySQLdb which was executing same queries (got from Django) and building same structure, using classes for storing data and saving instances in list for future processing. No any debug output, except result print in the end.
I found that:
without Django with Django
execution time x 10x
memory consumption y 25y
And I was only reading and counting, without performing update/insert queries.
Try to investigate this question for yourself, benchmark isn't hard to write and execute.
My Django application retrieves an RSS feed every day. I would like to persist the time the feed was last updated somewhere in the app. I'm only retrieving one feed, it will never grow to be multiple feeds. How can I persist the last updated time?
My ideas so far
Create a model and add a datetime field to it. This seems like overkill as it adds another table to the database, in which there will only ever be one row. Other than that, it's the most obvious and straight-forward solution.
Create a settings object which just stores key/value mappings. The last updated date would just be row in this database. This is essentially a generic version of the previous solution.
Use dbsettings/django-values, which allows you to store settings in the database. The last updated date would just be a 'setting'.
Any other ideas that I'm missing?
In spite of the fact databases regularly store many rows in any given table, having a table with only one row is not especially costly, so long as you don't have (m)any indexes, which would waste space. In fact most databases create many single row tables to implement some features, like monotonic sequences used for generating primary keys. I encourage you to create a regular model for this.
RAM is volatile, thus not persistent: memcached is not what you asked for.
XML it is not the right technology to store a single value.
RDMS is not the right technology to store a single value.
Django cache framework will answer your question if CACHE_BACKEND is set to anything else than file://...
The filesystem is the right technology to "persist a single value".
In settings.py:
RSS_FETCH_DATETIME_PATH=os.path.join(
os.path.abspath(os.path.dirname(__file__)),
'rss_fetch_datetime'
)
In your rss fetch script:
from django.conf import settings
handler = open(RSS_FETCH_DATETIME_PATH, 'w+')
handler.write(int(time.time()))
handler.close()
Wherever you need to read it:
from django.conf import settings
handler = open(RSS_FETCH_DATETIME_PATH, 'r+')
timestamp = int(handler.read())
handler.close()
But cron is the right tool if you want to "run a command every day", for example at 5AM:
0 5 * * * /path/to/manage.py runscript /path/to/retreive/script
Of course, you can still write the last update timestamp in a file at the end of the retreive script, and use it somewhere else, if that makes sense to you.
Concluding by quoting Ken Thompson:
One of my most productive days was
throwing away 1000 lines of code.
One solution I've used in the past is to use Django's cache feature. You set a value to True with an expiration time of one day (in your case.) If the value is not set, you fetch the feed, otherwise you don't do anything.
You can see my solution here: Importing your Flickr photos with Django
If you need it only for caching purposes, why not store it in the memcached?
On the other hand, if you use this data for other purposes (e.g. display it on the page, or to make some calculation, etc.), then I would store it in a new model - in Django, all persistence is built on top of the database, via models, and I would not try to use other "clever" solutions.
One thing I used to do when I was deving with PHP, was to store the xml somewhere, but with a new tag inserted to hold the timestamp of the latest retrieval. It wasn't great, but it was quick and simple.
Keeping it simple would lead to the idea of just storing it in the file system ... why can't you do that? You could, for example, have a siteconfig module in one of your apps which held these sorts of data. This could load up data from a specific file, which could be text, JSON, ConfigParser, pickle or any suitable format. Just import siteconfig somewhere, and it can load the data and make it available to the other modules in your site. You could easily extend this to hold a dict-like object with a number of settings (e.g., if you ever have multiple feeds, but don't want to have a model just for 2-3 rows, you could easily hold the last-retrieved time for each feed in a dict keyed by feed URL).
Create a session key, which persists forever and update the feed timestamp every time you access it.