Redis -- how does it improve performance? - amazon-web-services

i'm relatively new to the world of web-development and have only recently learned memory hierarchies in computer systems. I recently came across Redis and am itching to try it out in a small web-app. But before I do, I was wondering how is Redis going to improve performance? From what i've read so far, it seems that Redis is an "in-memory" data store, so does that mean that whenever a user requests a data from the server, instead of fetching from the database (given that the Redis data store is already populated with the needed data) the request can be fulfilled by accessing the data directly from the server's memory? To be specific, say if i have a web-app which back-end server is hosted on AWS, and the database is stored on MLAB, then whenever a user requests a data, instead of querying to the server which redirects the request to MLAB, it can now directly fetch the data from the server without going to MLAB ? Also, by in-memory, does that mean that the data is stored in the RAM on my AWS server?
Finally, how is this different from a cache?
Thank you so much!!

Well, Redis is used as a cache, the difference with most of the traditional cache is that you have other nice structures like hashes, sets, lists, TTL on keys, hyperlologs and so on, not only pair key:value.
You are right what you define about Redis, is but take into account that if you want to move your data from MLAB database to Redis you have to design some process to keep Redis update in each update that happens in your database. So every query from your application will use Redis to get data but apart from that you will need a process to keep update Redis with changes on your database, so if you use your application to update the database (and there are no other external parts which update your DB), every time you get an update from your web-app you have to update the DB and also Redis or having a command/script which detect every time an updated happened in the DB and update Redis properly.
AWS also provides Redis services, like ElasticCache https://aws.amazon.com/elasticache/?nc1=h_ls so basically the AWS ECS instance where you have your application doesn't use the RAM but this ElasticCache service which can live on another physical machine.
Finally, Redis store on memory the data though, it uses a dump file to save partial data in case of crashes and it also offers a persistence mode

Related

AWS Elasticache - Redis VS MemcacheD

I am reading in AWS console about Redis and MemcacheD:
Redis
In-memory data structure store used as database, cache and message broker. ElastiCache for Redis offers Multi-AZ with Auto-Failover and enhanced robustness.
Memcached
High-performance, distributed memory object caching system, intended for use in speeding up dynamic web applications.
Did anyone used/compared both? What is the main difference and use cases between the two?
Thanks.
Pasting my answer from another stackoverflow question
Select Memcached if you have these requirements:
You want the simplest model possible.
You need to run large nodes with multiple cores or threads.
You need the ability to scale out/in,
Adding and removing nodes as demand on your system increases and decreases.
You want to partition your data across multiple shards.
You need to cache objects, such as a database.
Select Redis if you have these requirements:
You need complex data types, such as strings, hashes, lists, and sets.
You need to sort or rank in-memory data-sets.
You want persistence of your key store.
You want to replicate your data from the primary to one or more read replicas for read intensive applications.
You need automatic failover if your primary node fails.
You want publish and subscribe (pub/sub) capabilities—to inform clients about events on the server.
You want backup and restore capabilities.
Here is interesting article by aws https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf

Scrapy and flask on ec2

So basically im new into this, so bear with me, please.
I have 3 python spiders that uses: scrappy,scrappy-user-agent,pandas,MongoDB.
they scrape around 150-200 pages every 12 hours and store the data locally into MongoDB collections.
And I have a flask app that connects the API endpoints with the collections and returns the data as response.
Would it possible to deploy both to same ec2 instance, or would flask and response be slowed down for users while the scrapping is done in parallel in same machine?
It is possible to deploy them both in the same instance. However, you need to know how much memory and CPU your both applications use, and choose your instance type accordingly.
Given the low frequency of your web scraping, it is very possible that it does not take much memory and CPU, but it may be the case if you are doing some heavy processing of the scrapped data.
To know about the memory and CPU configurations of each instance type: https://aws.amazon.com/ec2/instance-types/

What exactly is caching and how do I add it to an app I have on heroku?

I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.

Temporarily storing statistics on slave in Django

My slave servers collect statistics and performance metrics about visits, but eventually they would have to be sent to the master DB.
I don't want to have a permanent database connection open with the master DB server, so they would have to be temporarily stored locally and shipped over in chunks at specific intervals.
Any suggestions for tools to do this with Django? I've come up with the idea of storing the records in a local SQLite DB and sending them to the main DB server every hour for example. But maybe there are better ways than SQLite out there. Also, still not sure, for pushing the data back into the master DB server at regular intervals, would you use a direct DB connection from within Django, or design a simple API to send it over HTTPS?
I'll end up using redis and putting them in a list, prickling the objects. Main advantage: no SQLite migrations to maintain. It's really a pain though that Redis doesn't support atomic gets for more than 1 item, like LPOP but for more than 1, to retrieve them in batches...

Using Redis as intermediary cache for REST API

We have an iOS app that talks to a django server via REST API. Most of the data consists of rather large Item objects that involve a few related models that render into single flat dictionary, and this data changes rarely.
We've found, that querying this is not a problem for Postgres, but generating JSON responses takes a noticeable amount of time. On the other hand, item collections vary per-user.
I thought about a rendering system, where we just build a dictionary for Item object and save it into redis as JSON string, this way we can serve API directly from redis (e.g. HMGET(id of items in user library), which is fast, and makes it relatively easy to regenerate "rendered instances", basically just a couple of post_save signals.
I wonder how good this design is, are there any major flaws in it? Maybe there's a better way for the task?
Sure, we do the same at our firm, using Redis to store not JSON but large XML strings which are generated from backend databases for RESTful requests, and it saves lots of network hops and overhead.
A few things to keep in mind if this is the first time you're using Redis...
Dedicated Redis Server
Redis is single-threaded and should be deployed on a dedicated server with sufficient CPU power. Don't make the mistake of deploying it on your app or database server.
High Availability
Set up Redis with Master/Slave replication for high availability. I know there's been lots of progress with Redis cluster, so you may want to check on that too for HA.
Cache Hit/Miss
When checking Redis for a cache "hit", if the connection is dead or any exception occurs, don't fail the request, just default to the database; caching should always be 'best effort' since the database can always be used as a last resort.