Using Redis as intermediary cache for REST API - django

We have an iOS app that talks to a django server via REST API. Most of the data consists of rather large Item objects that involve a few related models that render into single flat dictionary, and this data changes rarely.
We've found, that querying this is not a problem for Postgres, but generating JSON responses takes a noticeable amount of time. On the other hand, item collections vary per-user.
I thought about a rendering system, where we just build a dictionary for Item object and save it into redis as JSON string, this way we can serve API directly from redis (e.g. HMGET(id of items in user library), which is fast, and makes it relatively easy to regenerate "rendered instances", basically just a couple of post_save signals.
I wonder how good this design is, are there any major flaws in it? Maybe there's a better way for the task?

Sure, we do the same at our firm, using Redis to store not JSON but large XML strings which are generated from backend databases for RESTful requests, and it saves lots of network hops and overhead.
A few things to keep in mind if this is the first time you're using Redis...
Dedicated Redis Server
Redis is single-threaded and should be deployed on a dedicated server with sufficient CPU power. Don't make the mistake of deploying it on your app or database server.
High Availability
Set up Redis with Master/Slave replication for high availability. I know there's been lots of progress with Redis cluster, so you may want to check on that too for HA.
Cache Hit/Miss
When checking Redis for a cache "hit", if the connection is dead or any exception occurs, don't fail the request, just default to the database; caching should always be 'best effort' since the database can always be used as a last resort.

Related

Do I need to use a caching technology like memcached or redis?

I am new to web development in general.
I am developing a social media website (very much like twitter) using django rest framework as backend and react as front end. I am going to deploy the app to Heroku.
Now, I just heard about this thing called memcached and redis. So, what is the usecase here? Should I use it or It's just for high traffic websites?
Cache in generally called in-memory cache, which store data primarily in memory(like memcached and Redis), and will provide faster way for data access in heavy traffic case.
And Cache-database consistency is always been an issue as you do have multiple different data sources. There are some good solutions to improve it but it still not perfect in sync.
So based on your read/write traffic, if db can handle the traffic perfectly and no performance issue, you don't need to consider cache(most of the productive database also have caching, like MySQL, or DynamoDB). And if db cannot handle your traffic, you should consider using cache.

What exactly is caching and how do I add it to an app I have on heroku?

I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.

Redis -- how does it improve performance?

i'm relatively new to the world of web-development and have only recently learned memory hierarchies in computer systems. I recently came across Redis and am itching to try it out in a small web-app. But before I do, I was wondering how is Redis going to improve performance? From what i've read so far, it seems that Redis is an "in-memory" data store, so does that mean that whenever a user requests a data from the server, instead of fetching from the database (given that the Redis data store is already populated with the needed data) the request can be fulfilled by accessing the data directly from the server's memory? To be specific, say if i have a web-app which back-end server is hosted on AWS, and the database is stored on MLAB, then whenever a user requests a data, instead of querying to the server which redirects the request to MLAB, it can now directly fetch the data from the server without going to MLAB ? Also, by in-memory, does that mean that the data is stored in the RAM on my AWS server?
Finally, how is this different from a cache?
Thank you so much!!
Well, Redis is used as a cache, the difference with most of the traditional cache is that you have other nice structures like hashes, sets, lists, TTL on keys, hyperlologs and so on, not only pair key:value.
You are right what you define about Redis, is but take into account that if you want to move your data from MLAB database to Redis you have to design some process to keep Redis update in each update that happens in your database. So every query from your application will use Redis to get data but apart from that you will need a process to keep update Redis with changes on your database, so if you use your application to update the database (and there are no other external parts which update your DB), every time you get an update from your web-app you have to update the DB and also Redis or having a command/script which detect every time an updated happened in the DB and update Redis properly.
AWS also provides Redis services, like ElasticCache https://aws.amazon.com/elasticache/?nc1=h_ls so basically the AWS ECS instance where you have your application doesn't use the RAM but this ElasticCache service which can live on another physical machine.
Finally, Redis store on memory the data though, it uses a dump file to save partial data in case of crashes and it also offers a persistence mode

REST API or "direct" database access for remote Celery/Django workers?

I'm working on a project that will have multiple celery workers on machines in different locations in the US that will communicate over the internet.
Am I better off distributing my Django project to each machine and configuring them with the database credentials to my database host, or should I have a "main" Django/database host that presents a REST API for remote celery tasks and workers to hit for the database access?
Mostly looking for pros/cons and any factors I haven't thought of.
I can provide single simple API endpoints that provide all the data my tasks need to query and simple POST API endpoints that can create all the database entries my tasks need to create.
I'll have no more than say 10 remote workers that would maybe altogether being doing 1 request per minute.
I'm thinking this probably means my concerns are not so much about the request/response overhead but more about maintainability, architecture, security...
The answer depends on too many variables, concerns, forces and whatnots to be anything else than "it depends...".
I assume you already thought of the following pros and cons, but anyway:
Using an API will make for longer request/response cycles (obviously) and put quite some load on the Django project (front server, app server etc). Also it means that your tasks won't be able to use all of the database features (complex queries, aggregations, whatever).
OTHO adding an API layer will isolate the workers from the inner db schema, which can make migrations (on the Django side) and deployment easier as you won't have to stop all the workers, deploy to everyone and restart the workers. Well it even makes it possible to change the API side technology without impacting the workers (not that I see much reason to do so but anyway...). But it also mean you have a whole API to maintain, and chances are that model changes - or at last part of them - will impact your API and/or your tasks code anyway (if the changes are about adding features that the workers should use etc).
IOW, it really depends (yeah, I already said so, didn't I ?) on your project's needs and constraints, and only you/your team know which solution will best match your project.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.