I am working on an application on AWS and I am using AWS elasticache for caching.
I am confused between using memcached or redis.
I read the about the redis 3.0.2 update and how it is equivalent to memchached now.
https://groups.google.com/forum/#!msg/redis-db/dO0bFyD_THQ/Uoo2GjIx6qgJ
But I read on the amazon aws faq page that amazon elasticache dows not support 3.0.2. They currently support Redis 2.6.13, 2.8.6 and 2.8.19.
http://aws.amazon.com/elasticache/faqs/ (Date June 10,2015)
I have read AWS white papers on elsticache. But they have not specified for which version of redis they are providing the suggestions.
How should I decide between the use of memcached or redis for any application I may create ? What are the points one needs to keep in mind before using redis or memcached ? Should I consider that amazon will update the redis version soon and go on with redis ?
p.s. I am a novice developer.
Actually depends upon use case
Select Memcached if you have these requirements:
You want the simplest model possible.
You need to run large nodeswith multiple cores or threads.
You need the ability to scale out/in,
Adding and removing nodes as demand on your system increases and decreases.
You want to partition your data across multiple shards.
You need to cache objects, such as a database.
Select Redis if you have these requirements:
You need complex data types, such as strings, hashes, lists, and sets.
You need to sort or rank in-memory data-sets.
You want persistence of your key store.
You want to replicate your data from the primary to one or more read replicas for read intensive applications.
You need automatic failover if your primary node fails.
You want publish and subscribe (pub/sub) capabilities—to inform clients about events on the server.
You want backup and restore capabilities.
Here is interesting article by aws https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
This is the main discussion of comparing Memcached and Redis Memcached vs. Redis?
Both AWS and Azure for sure will upgrade in the future to the newer versions of Redis, but when and how they will roll out it will depend only on them. Meanwhile you could install Redis 3.0.2 yourself, but you need to see if you really need Redis 3 which actually gives you the cluster support. And if you don't need the cluster then you can go with 2.8 from Elasticache.
Related
I currently have various clients and various redis clusters on aws, on for each client. But I would like to know if it's possible to divide de redis cluster so I can use, let's say, one redis cluster for a lot of clients. Each client has its own server that uses the cache independently, and if they were in the same cluster, they probably would have keys with the same name and might be in conflict. So in this scenario my questions are:
Is it possible to divide the redis cluster for use different parts of it isolated from the others ? I think this is different from shards.
How would I avoid the probable conflict described above ?
When trying to achieve protection against key name collisions you can use multiple approaches.
If you are using none cluster mode Redis then you can configure each client to use different database as described here. Please note that if you suspect you might need to scale out in the future then multiple databases are not supported by Redis. This would be a reason to start with other options.
You can use {tags} as described here. The tag could be the name of your client or anything else which you want to use to group all keys on the same slot for multi key operations. This will also promise that key {client1}key1 will not collide with key {client2}key1. This approach will work for both cluster mode enabled and cluster mode disabled clusters so it is future proof if you will need to scale out.
We have HTTP sessions in on-premise application. We want to migrate application to Cloud. We got the direction to use REDIS cache implementation in Cloud to replace HTTP sessions.
Do we save user specific(HTTP Session) data in REDIS? Is there any other elegant way to handle this scenario?
Thanks in advance.
Assuming you're talking about a legacy app, you can set Redis (Azure Redis Cache) as your State Provider.
Here's a link about it:
https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-aspnet-session-state-provider
Yes it is possible and Redis is one of the pinpoint solutions for this kind of requirements. It is super fast in-memory key/value store just like sessions(get/set). Most of the modern frameworks come along with built-in session support for Redis. Even it is a legacy app, you may integrate easily(there could some libraries that do that). You may just use commands such as SET, GET, EXPIRE, EXISTS, DEL for a session store.
If it is going to be just string/string you may go with string, if you have some json values you may use hash. Both solutions provide EXPIRE option for you to not store forever and manage your memory.
I am not familiar with Azure side but AWS has ElastiCache service that supports Redis. Another option could be installing one in a EC2 instance for on-prem.
I am reading in AWS console about Redis and MemcacheD:
Redis
In-memory data structure store used as database, cache and message broker. ElastiCache for Redis offers Multi-AZ with Auto-Failover and enhanced robustness.
Memcached
High-performance, distributed memory object caching system, intended for use in speeding up dynamic web applications.
Did anyone used/compared both? What is the main difference and use cases between the two?
Thanks.
Pasting my answer from another stackoverflow question
Select Memcached if you have these requirements:
You want the simplest model possible.
You need to run large nodes with multiple cores or threads.
You need the ability to scale out/in,
Adding and removing nodes as demand on your system increases and decreases.
You want to partition your data across multiple shards.
You need to cache objects, such as a database.
Select Redis if you have these requirements:
You need complex data types, such as strings, hashes, lists, and sets.
You need to sort or rank in-memory data-sets.
You want persistence of your key store.
You want to replicate your data from the primary to one or more read replicas for read intensive applications.
You need automatic failover if your primary node fails.
You want publish and subscribe (pub/sub) capabilities—to inform clients about events on the server.
You want backup and restore capabilities.
Here is interesting article by aws https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
i'm relatively new to the world of web-development and have only recently learned memory hierarchies in computer systems. I recently came across Redis and am itching to try it out in a small web-app. But before I do, I was wondering how is Redis going to improve performance? From what i've read so far, it seems that Redis is an "in-memory" data store, so does that mean that whenever a user requests a data from the server, instead of fetching from the database (given that the Redis data store is already populated with the needed data) the request can be fulfilled by accessing the data directly from the server's memory? To be specific, say if i have a web-app which back-end server is hosted on AWS, and the database is stored on MLAB, then whenever a user requests a data, instead of querying to the server which redirects the request to MLAB, it can now directly fetch the data from the server without going to MLAB ? Also, by in-memory, does that mean that the data is stored in the RAM on my AWS server?
Finally, how is this different from a cache?
Thank you so much!!
Well, Redis is used as a cache, the difference with most of the traditional cache is that you have other nice structures like hashes, sets, lists, TTL on keys, hyperlologs and so on, not only pair key:value.
You are right what you define about Redis, is but take into account that if you want to move your data from MLAB database to Redis you have to design some process to keep Redis update in each update that happens in your database. So every query from your application will use Redis to get data but apart from that you will need a process to keep update Redis with changes on your database, so if you use your application to update the database (and there are no other external parts which update your DB), every time you get an update from your web-app you have to update the DB and also Redis or having a command/script which detect every time an updated happened in the DB and update Redis properly.
AWS also provides Redis services, like ElasticCache https://aws.amazon.com/elasticache/?nc1=h_ls so basically the AWS ECS instance where you have your application doesn't use the RAM but this ElasticCache service which can live on another physical machine.
Finally, Redis store on memory the data though, it uses a dump file to save partial data in case of crashes and it also offers a persistence mode
I am writing a webapp thats runs on AWS. My app requires users to upload their pdf files. I will convert them into Images using the "convert" utility in linux.
Here is my setup on Ubuntu 12.04:
Django
Celery
Django Celery
Boto
I am using apache as my webserver.
The work flow is as follows:
Three are three asynchronous tasks and two queues for handling all the processing and S3 for storing input and Output files.
A user uploads a pdf then:
accept_file_task is called: This task takes the user uploaded pdf and stores it in my S3 storage and then inserts a message into the input_queue(SQS)
check_queue_and_launch_instance_task: A periodic task that keeps monitoring the number of messages in the input_queue and launches instances whenever the queue has more messages than the no of Ec2 instances
The instances have a bootstrap script which is a while True: loop. Any of the instances can pick the message from the input_queue and do a Subprocess.Popen("convert "+input+ouput) and write the processed stated to output_queue and also upload the image generated into S3 output bucket and make it available as a download link
output_process_task: another periodic task that keeps polling the output_queue and whenever a message is available it will update the status in the table mentioned below.
I am using a model called Document to store all the status information. I also have users registering and hence a table to store all user information. Also Celery created a lot of tables to store all its task information. Right now I am using a single instance and the sqlite3 database (that comes with python) on that instance.
I am unsure about the following things
How do I scale up the database? Should I go for a RDS or a simpleDB or AmazonDB. If not celery, I could have easily used simpleDB. I am really stuck on this one
How do I get rid of the two periodic tasks check_queue_and_launch_instance_task and output_process_task. My idea is that Autoscaling must be used in some way so that if need at a later stage an Elastic Load Balancer can be used.
If any of you have designed something similar please help me on how to go about it
How do I scale up the database? Should I go for a RDS or a simpleDB or AmazonDB. If not celery, I could have easily used simpleDB. I am really stuck on this one
Keep in mind that premature optimization is the root of all evil. The question of RDS (which is really just MySQL, Oracle, or MS SQL) vs. SimpleDB is more of an application design decision than one based on scalability. SimpleDB is just a simple key-value store. RDS, on the other hand, will give you full ACID functionality. If your data is relational, then you should be using a relational database. If you just need a place to store simple strings or integers, then something like SimpleDB would make more sense.
Right now I am using a single instance and the sqlite3 database (that comes with python) on that instance.
Make sure that you understand the consequences of a) creating a single point-of-failure in your design and b) SQLite's limitations compared to using a standalone RDBMS in this application. (You can use it, but it's really intended for single-user applications).
How do I get rid of the two periodic tasks check_queue_and_launch_instance_task and output_process_task. My idea is that Autoscaling must be used in some way so that if need at a later stage an Elastic Load Balancer can be used.
If you're willing to replace Celery with SQS, you can tie together SQS + SNS + Cloudwatch to simplify this portion of your app. Though what you're doing doesn't sound like a bad choice, especially if it's working well already. Your time is probably better spent working on the problems in front of you rather than those that might occur down the road.