Django - how to 'remember' some SQL queries - django

There's this line in the Django documentation:
"When running with DEBUG turned on, Django will remember every SQL
query it executes. This is useful when you’re debugging, but it’ll
rapidly consume memory on a production server."
So if I want to remember some queries when DEBUG=False on my server, how do I do it? Use Redis/memcached I guess? But they are key-value stores, so I will need to go over several files and change the code, right? What if I just want to cache a SQL query that's repeated often. The documentation around caching is mostly about template caching or storing key-value pairs. I couldn't find anything about caching full, often used SQL query.

Related

Postgresql in memory database django

For performance issues I would like to execute an optimization algorithm on an in memory database in django (I'm likely to execute a lot of queries). I know it's possible to use a sqlite in memory (How to run Django's test database only in memory?) but I would rather use postgresql because our prod database is a postgresql one.
Does someone knows how to tell django to create the postgresql database in the memory ?
Thanks in advance
This is premature optimization. Postgresql is very very fast even if you are running it on a cold metal hard disk provided you use the right indexes. If you don't persist the data on disk, you are opening yourself upto a world of pain.
If on the other hand you want to speed up your tests by running an in memory postgresql database you can try something like these non durability optimizations:
http://www.postgresql.org/docs/9.1/static/non-durability.html
The most drastic suggestion is to use a ramdisk on that page. Here's how to set up one. After following the OS/Postgresql steps edit django settings.py and add the tablespace to the DATABASES section.
Last but not least: This is just a complete waste of time.
This is not possible. You cannot use PostgreSQL exclusively in memory; at least not without defeating the purpose of using PostgreSQL over something else. An in-memory data store like Redis running alongside PostgreSQL is the best you can do.
Also, at this point, the configuration is far out of Django's hands. It will have to be done outside of Django's environment.
It appears you may be missing some understanding about how these systems work. Build your app first to verify that everything is working, then worry about optimizing the database in such ways.

Advice on caching for Django/Postgres application

I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.

How to cache MySQL table in C++ web service

I've got a big users table that I'm caching in a C++ web service (BitTorrent tracker). The entire table is refetched every 5 minutes. This has some drawbacks, like data being up to 5 minutes old and refetching lots of data that hasn't changed.
Is there a simple way to fetch just the changes since last time?
Ideally I'd not have to change the queries that update the data.
Two immediate possibilities come to me:
MySQL Query Cache
Memcached (or similar) Caching Layer
I would try the query cache first as it likely is far easier to setup. Do some basic tests/benchmarks and see if it fits your needs. Memcached will likely be very similar to your existing cache but, as you mention, you'll to find a better way of invalidating stale cache entries (something that the query cache does for you).

Django slow queries: Connect django filter statements to slow queries in database logs

If you are trying to diagnose slow queries in your mysql backend and are using a Django frontend, how do you tie together the slow queries reported by the backend with specific querysets in the Django frontend code?
I think you has no alternative besides logging every django query for the suspicious querysets.
See this answer on how to access the actual query for a given queryset.
If you install django-devserver, it will show you the queries that are being run and the time they take in your shell when using runserver.
Another alternative is django-debug-toolbar, which will do the same in a side panel-overlay on your site.
Either way, you'll need to test it out in your development environment. However, neither really solves the issue of pinpointing you directly to the offending queries; they work on a per-request basis. As a result, you'll have to do a little thinking about which of your views are using the database most heavily and/or deal with exceptionally large amounts of data, but by cherry-picking likely-candidate views and inspecting the times for the queries to run on those pages, you should be able to get a handle on which particular queries are the worst.