Using JSON file to improve caching - good idea? - django

I'm a beginner to caching. I'm currently working on a small project with Django and will be implementing caching later via memcached.
I have a page with a video on it and the video has a bunch of comments. The only content on the page that is likely to change regularly is the comments and the "You are logged in as.../You are not logged in..." message.
I was thinking I could create a JSON file that serves the username and most recent comments, including it in the head with <script src="videojson.js"></script>. That way I could populate the HTML via Javascript instead of caching the whole page on a per-user basis.
Is this a suitable approach, or is the caching system smarter than I give it credit for?

How is the JavaScript going to get the json object? Are going to serve from a django view that the us calls? And in that view you will just pull out of memcached if available and DB if not?
That seems reasonable assuming your json isn't very big. If your comments change a lot and you have to spend a lot of time querying the db, building the json object and saving to memcache every time a new comment is written, it won't work well. But if you only fill the cache when your json expires, and you don't care about having the latest and greatest comments on there instantly, it should work.
One thing to point out is that if you aren't getting that much traffic now, you might be adding a level of complexity that won't give you much return on your time spent. But if you are using this to learn how to do caching then it is a good exercise.
Hope that helps

Related

Render slow loading results

I have a website that uses a pretty slow external API (0.9 seconds for a request).
The results from this API request are rendered to the page.
I use some kind of own caching, because I store the results in a DB and subsequent queries for the same resource are queried from the DB rather than requesting from the API again. If the data in the DB is too old (>10 Minutes), I update the DB with a new API request.
It will be pretty common to check the website only occasionally during the day, so you will always hit the 10 Minute limit and always have a pretty long loading time >1s. This feels very unresponsive.
I then searched for ways to get around the loading time and found this.
I think this could be the right direction, but I am still not confident on how to tackle the task. Can anybody point me in the right direction as how to implement this?
Should I use the low level cache api?
Could I use the default cache? Or should I implement my own version?
Do you think the solution provided in the first link is a good idea at all?

Advice on caching for Django/Postgres application

I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.

How does Syndication work in Django?

Why this (hopefully) isn't a broad question:
I've been looking at the Django source code on syndication. I understand functionally what these feeds are and what they do but I'm not sure how the magic happens.
Actual question:
What is Django doing under the hood to send these changes out across the wire? Is Django just creating an object (like an XML file) the Client reads and not even using the network? What mechanism is employed to ensure users get those updates in a 'reasonable' amount of time - is it a combination of the browser (or some other software) knowing to go look for updates while Django diligently adds data to a file, or does Django do most of the work?
There's no magic, and Django does not do anything to even try to ensure clients get updates in any particular amount of time.
Feeds, like almost everything on the web, are an entirely pull-based mechanism. Feed readers are responsible for periodically requesting updates from the client.

Using memcached with a dynamic django backend

My Django backend is always dynamic. It serves an iOS app similar to that of Instagram and Vine where users upload photos/videos and their followers can comment and like the content. Just for the sake of this question, imagine my backend serves an iOS app that is exactly like Instagram.
Many sources claim that using memcached can improve performance because it decreases the amount of hits that are made to the database.
My question is, for a backend that is already in dynamic in nature (always changing since users are uploading new pictures, commenting, liking, following new users etc..) what can I possibly cache?
It's a problem I've been thinking about for quite some time. I could cache the user profile data, but other than that, I don't know where else memcached would be useful.
Other sources mentioned using it everywhere in the backend where a 'GET' call is made but then I would need to set a suitable time limit to expire the cache since the app is always dynamic. What are your solutions and suggestions for getting around this problem?
You would cache whatever is being most frequently accessed from your Database. Make a list of the most frequent requests to get data from the database and cache the data in that priority.
Cache the most frequent requests based on category of the pictures
Cache based on users - power users go into cache (those which do a lot of data access)
Cache the most recent inserts (in case you have a page which shows the recently added posts/pictures)
I am sure you can come up with more scenarios. I am positive memcached (or any other caching) will help, even though your app is very 'dynamic'.

Tracing requests of users by logging their actions to DB in django

I want to trace user's actions in my web site by logging their requests to database as plain text in Django.
I consider to write a custom decorator and place it to every view that I want to trace.
However, I have some troubles in my design.
First of all, is such logging mecahinsm reasonable or because of my log table will be enlarging rapidly it causes some preformance problems ?
Secondly, how should be my log table's design ?
I want to keep keywords if the user call search view or keep the item's id if the user call details of item view.
Besides, IP addresses of user's should be kept but how can I seperate users if they connect via single IP address as in many companies.
I am glad to explain in detail if you think my question is unclear.
Thanks
I wouldn't do that. If this is a production service then you've got a proper web server running in front of it, right? Apache, or nginx or something. That can do logging, and can do it well, and can write to a form that won't bloat your database, and there's a wealth of analytical tools for log analysis.
You are going to have to duplicate a lot of that functionality in your decorator, such as when you want to switch it on or off, or change the log level. The only thing you'll get by doing it all in django is the possibility of ultra-fine control, such as only logging views of blog posts with id numbers greater than X or something. But generally you'd not want that level of detail, and you'd log everything and do any stripping at the analysis phase. You've not given any reason currently why you need to do it from Django.
If you really want it in a RDBMS, reading an apache log file into Postgres or MySQL or one of those expensive ones is fairly trivial.
One thing you should keep in mind is that SQL databases don't offer you a very good writing performance (in comparison with reading), so if you are experiencing heavy loads you should probably look for a better in-memory solution (eg. some key-value-store like redis).
But keep in mind, that, especially if you would use a non-sql solution you should be aware what you want to do with the collected data (just display something like a 'log' or do some more in-deep searching/querying on the data).
If you want to identify different users from the same IP address you should probably look for a cookie-based solution (if you are using django's session framework the session's are per default identified through a cookie - so you could just simply use sessions). Another solution could be doing the logging 'asynchronously' via javascript after the page has loaded in the browser (which could give you more possibilities in identifying the user and avoid additional load when generating the page).