Is retrieving data from a redis instance slower than retreiving the same from Django's request.session dictionary? - django

In a Python/Django application, is retrieving a value stored in redis slower than retrieving one stored in the request.session dictionary?
Background:
I have a Django app where I use DB-based sessions. I.e., instead of using django.contrib.sessions, I used this nifty little 3rd party library.
I recently ran a benchmark whereby I saved a test value in a local redis instance via the redis-py wrapper (i.e. my_server.set('test','1')). I saved the same test value in request.session['test'].
I then retrieved the test value from each, and compared the time taken. request.session out performed redis by a factor exceeding 2x in this scenario.
Problem:
The application is not distributed in any way, everything is shared and happens on the same machine - very vanilla set up.
The result appears counter-intuitive to me. Why? Because my sessions are DB based, and I thought redis would be faster than whatever Django has to offer. Clearly, I am wrong.
Can an expert explain what's actually going on here? Maybe the python wrapper on redis' core API is slowing things down?
In case you need more information, or are skeptical about how I ran the benchmark, please do ask.
P.s. I simply put the two competing ways in a for loop for 100K iterations and measured the time taken to complete.

The session is stored as a single blob, not as individual keys. It has almost certainly already been loaded and decoded by the time you get into your view, most likely by the auth middleware. Once it is loaded it is stored locally as a dictionary, which is all that your timing tests will measure.

Related

What exactly is caching and how do I add it to an app I have on heroku?

I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.

Can you run a whole service using Redis?

So I'm currently developing a messaging application to learn the process and I'm actually using Redis as a cache and use it with websockets to push real-time messages.
And then, this question popped in my mind:
Is it possible, to use Redis only to run a whole service (like a messaging application for example) ?
NOTE : This imply removing any form of database (we're only keeping strings)
I know you can set-up Redis to be persistent, but is it enough ? Is it robust enough ? Would it be a safe enough move ? Or totally insane ?
What are you thoughts ? I'd really like to know, and if you think it is possible, I'll give it a shot.
Thanks !
A few companies use Redis as their unique or primary database, so it is definitely not insane.
You can develop and run a full service using Redis as a backend, as long as you understand and accept the tradeoffs it implies.
By this I mean:
that you can use a Redis server as a high performance database as long as your whole data can reside in memory. It may imply that you reduce the size of your data, or choose not to store some of them which may be computed by your app on read access or imported from another source;
that if you can't store all of your data in the memory of a single server, you can use a Redis cluster, but it will limit the available Redis features (see implemented subset
that you have to think about the potential data losses when a server crashes, and determine if they are acceptable or not. It may be OK to lose some data if the process which produced them is robust and will create them again when the database restart (by example when the data stored in Redis come from an import process, which will start again from the last imported item). You can also use several Redis instances, with different persistency configuration: one which writes on disk each time a key is modified, avoiding potential data loss, but with much lower performances; and another one to store non critical data, which are written on disk every couple of seconds.
Redis may be used to store structured data, not only strings, using hashes. Each time you would create an index in a relational model, you have to create a data structure in Redis. By example if you want to store Person objects, you create a HASH for each of them, to store their properties, including a unique ID. If you want to be able to get people by city, you create a SET for each city, and you insert the ID of each newly created Person in the corresponding SET. So you will be able to get the list of persons in a given city. It's just an example, you have to define the model and data structures to be used according to your application.

Advice on caching for Django/Postgres application

I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.
My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).
The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.
So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).
It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.
But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?
The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.
Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.
Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.
As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.

Using Redis as intermediary cache for REST API

We have an iOS app that talks to a django server via REST API. Most of the data consists of rather large Item objects that involve a few related models that render into single flat dictionary, and this data changes rarely.
We've found, that querying this is not a problem for Postgres, but generating JSON responses takes a noticeable amount of time. On the other hand, item collections vary per-user.
I thought about a rendering system, where we just build a dictionary for Item object and save it into redis as JSON string, this way we can serve API directly from redis (e.g. HMGET(id of items in user library), which is fast, and makes it relatively easy to regenerate "rendered instances", basically just a couple of post_save signals.
I wonder how good this design is, are there any major flaws in it? Maybe there's a better way for the task?
Sure, we do the same at our firm, using Redis to store not JSON but large XML strings which are generated from backend databases for RESTful requests, and it saves lots of network hops and overhead.
A few things to keep in mind if this is the first time you're using Redis...
Dedicated Redis Server
Redis is single-threaded and should be deployed on a dedicated server with sufficient CPU power. Don't make the mistake of deploying it on your app or database server.
High Availability
Set up Redis with Master/Slave replication for high availability. I know there's been lots of progress with Redis cluster, so you may want to check on that too for HA.
Cache Hit/Miss
When checking Redis for a cache "hit", if the connection is dead or any exception occurs, don't fail the request, just default to the database; caching should always be 'best effort' since the database can always be used as a last resort.

Designing backend software for multiplayer cross platform app

I am currently in the initial design phase of my first app.
In my app there will be individual sessions containing 1-5 users.
I need to be able to keep track of each users gps location and be able to push and pull them to each of the users. Each user will have the most recently reported location of every other user in the session.
There will be other calculations done on the data set but that will be client side, the server should only need to handle pushing and pulling of user locations (and the usernames).
I'm predicting due to the nature of the app 90% of sessions should not last more than 2 hours, with the possibility of the server ending sessions that are older then 24-48 hours (once real world testing of the app begins I would have a better idea of how long sessions should last).
I was thinking of using django to build an API, and to store all the data in the program itself and not to use a database as this should be faster and I don't think it is necessary to store the data since it has such a short lifetime.
Is this a good starting point? Is there anything I should be thinking about or considering? I'm completely new to designing backend software.
While performance might not even be an issue in the beginning, there are some things you can do once you hit a certain load:
Keep all your session data in one model, even if you're denormalizing (putting redundant information into your database) your database a bit. That way you only have to do one read to the database and no expensive JOINs
Use the Django caching framework (https://docs.djangoproject.com/en/dev/topics/cache/) to cache views, so multiple reads of the same data don't have to hit the database
Before you start optimizing, profile your code to see where your performance bottlenecks really are. Sometimes you'll be surprised which operations are expensive, and which aren't.