Is there an easy way to understand the difference between AWS Elasticache and RDS?

Is there an easy way to understand the difference between AWS Elasticache and RDS? - amazon-web-services

I'm learning AWS, kind of confusing about Elasticache & RDS, I read the article in this link, but still confused, can someone explain a little bit? Many thanks.

This is a general question about storage technologies: "how does a cache differ from a database?"
A cache is not (typically) a persistent data store. Its data is ephemeral. The purpose of the cache is to increase the perceived performance of an actual database, sitting behind the cache. The database stores the actual data persistently, and is the authoritative source of data. The cache sits in front of the database and tries to improve the performance of your application by detecting queries that it already knows the answer to and serving up cached results directly to your application, to save having to go to the database.
Of course, the cache will get out of date over time and so you need a process for expiring data from the cache when it becomes inaccurate, thus causing the next query for that piece of data to go to the actual database, and that new data can be cached until it expires.

RDS stands for relational database service. If you need managed instances of relational databases like Oracle, MS-SQL server, MySQL, MariaDB, or PostgreSQL then you need to use RDS.
Elasticache however is caching db as a service. It supports two popular engines memcache and redis.
DynamoDB is no-sql DB as a service.

Use cases for RDS and elasticache are very different.
Use RDS When,
there is a need to persist data
needs ACID compliance
require oltp db engine
Use in-memory distributed cache such as elasticache when,
reduce latency
Offload db pressure
handle transient data

Related

AWS containerised apps and database on same Redshift cluster

I a simple question for someone with experience with AWS but I am getting a little confused with the terminology and know how to proceed with which node to purchase.
At my company we currently have a a postgres db that we insert into continuously.
We probably insert ~ 600M rows at year at the moment but would like to be able to scale up.
Each Row is basically a timestamp and two floats, one int and one enum type.
So the workload is write intensive but with also constant small reads.
(There will be the occasional large read)
There are also two services that need to be run (both Rust based)
1, We have a rust application that abstracts the db data allowing clients to access it through a restful interface.
2, We have a rust app that gets the data to import from thousands on individual devices through modbus)
These devices are on a private mobile network. Can I setup AWS cluster nodes to be able to access a private network through a VPN ?
We would like to move to Amazon Redshift but am confused with the node types
Amazon recommend choosing RA3 or DC2
If we chose ra3.4xlarge that means you get one cluster of nodes right ?
Can I run our rust services on that cluster along with a number of Redshift database instances?
I believe AWS uses docker and I could containerise my services easily I think.
Or am I misunderstanding things and when you purchase a Redshift cluster you can only run Redshift on this cluster and have to get a different one for containerised applications, possibly an ec2 cluster ?
Can anyone recommend a better fit for scaling this workload ?
Thanks

I would not recommend Redshift for this application and I'm a Redshift guy. Redshift is designed for analytic workloads (lots or reads and few, large writes). Constant updates is not what it is designed to do.
I would point you to Postgres RDS as the best fit. It has a Restful API interface already. This will be more of the transactional database you are looking for with little migration change.
When your data get really large (TB+) you can add Redshift to the mix to quickly perform the analytics you need.
Just my $.02

Redshift is a Managed service, you don't get any access to it for installing stuff, neither is there a possibility of installing/running any custom software of your own
Or am I misunderstanding things and when you purchase a Redshift cluster you can only run Redshift on this cluster
Yes, you don't run stuff - AWS manages the cluster and you run your analytics/queries etc.
have to get a different one for containerised applications, possibly an ec2 cluster ?
Yes, you could possibly make use of EC2, running the orchestrators on your own, or make use of ECS/Fargate/EKS depending on your budget/how skilled your members are etc

Connecting Elasticache with RDS?

I am new to the Elastic Cache concept.
Please pardon my knowledge on the concept.
currently, I want to set up an Elastic cache in front of my Postgres RDS.
I have theoretical knowledge of the functionality of elastic cache.
I am wording on how to set up an elastic cache that connects to my RDS instance?
What endpoint needs to be shared with the developers to access the elastic cache?
Or I completely understood the concept wrongly
Could anyone help me out of this?

ElastiCache is an in-memory datastore, with one of its primary use cases being to be used as a cache.
It would never directly connect to your MySQL database, instead values retrieved from the database would be added to the ElastiCache cluster.
Your application would need to make the decision of checking the cache first, and if it was empty then querying your MySQL database to retrieve the data. The successful result would then be written to the ElastiCache store so that next time your application attempts to do this it would just retrieve the result from the cache.
There will be a number of libraries that exist to combine this functionality so you could take a look at implementing one of these (dependant on your language) if you don't want to architect it yourself.
An alternative approach to caches I have also seen is called write through which involves everytime a write happens it will also write to the cache, for this reason your application only ever needs to read from the cache.
AWS have a great page to break down caching strategies that should help to provide further input.

Dynamodb vs Redis

We're using AWS, and considering to use DynamoDB or Redis on our new service.
Below is our service's character
Insert/Delete occur over between hundreds and thousands per minute, and will be larger later.
We don't need quick search, only need to find a value with key
Data should not be lost.
There are another data that doesn't have a lot of Insert/Delete unlike 1.
I'm worried about when Redis server down.
When the Redis failure, our data will be removed.
That's why I'm considering to select Amazon DynamoDB.
Because DynamoDB is NoSQL, so Insert/Delete is so fast(slower than Redis, but we don't need to that much speed), and store data permanently.
But I'm not sure that my thinking is right or not.
If I'm thinking wrong or don't think another important point, I'm going appreciate when you guys teach me.
Thanks.

There are two type of Redis deployment in AWS ElastiCache service:
Standalone
Multi-AZ cluster
With standalone installation it is possible to turn on persistence for a Redis instance, so service can recover data after reboot. But in some cases, like underlying hardware degradation, AWS can migrate Redis to another instance and lose persistent log.
In Multi-AZ cluster installation it is not possible to enable persistence, only replication is occur. In case of failure it takes a time to promote replica to master state. Another way is to use master and slave endpoints in the application directly, which is complicated. In case of failure which cause a restart both Redis node at time it is possible to lose all data of the cluster configuration too.
So, in general, Redis doesn't provide high durability of the data, while gives you very good performance.
DynamoDB is highly available and durable storage of you data. Internally it replicates data into several availability zones, so it is highly available by default. It is also fully managed AWS service, so you don't need to care about Clusters, Nodes, Monitoring ... etc, which is considering as a right cloud way.
Dynamo DB is charging by R/W operation (on-demand or reserved capacity model) and amount of stored data. In may be really cheap for testing of the service, but much more expensive under the heavy load. You should carefully analyze you workload and calculate total service costs.
As for performance: DynamoDB is a SSD Database comparing to Redis in-memory store, but it is possible to use DAX - in-memory cache read replica for DynamoDB as accelerator on heavy load. So you won't be strictly limited with the DynamoDB performance.
Here is the link to DynamoDB pricing calculator which one of the most complicated part of service usage: https://aws.amazon.com/dynamodb/pricing/

Is there a way to persist an ELB stickiness session even if the instance its connected to fails?

Just curious if this is possible or how you would accomplish this.
Regardless if I use duration based stickiness or application based, when the instance a user is connected to fails their session gets reset because they have to connect to a new server.
Is there a way to not have this happen? To be able to have that session persist even if the instance they are connected to dies? Im also using SSL with a cert if that changes things.

The only way to accomplish that is persisting your session state in some Storage service, could be a database table, s3, Caching service, NoSQL table, Etc.
These are some approaches
Session state Inside Your Database
Saving session state inside the database is common in lightweight web frameworks like Django. That way you can add as many front servers as you like without having to worry about session replication and other difficult stuff. You don’t tie yourself to a certain web server and you get persistence and all other features databases provide for free. As far as I can tell, this works rather nicely for small to medium size websites.
The problem is the usual: The database server may become your bottleneck. In that case your best bet may be to take a suitcase full of money to Oracle or IBM and buy yourself a database cluster.
Reference: Saving Session Data in Web Applications
Session state inside a Caching service
Amazon ElastiCache offers fully managed Redis and Memcached. Seamlessly deploy, operate, and scale popular open source compatible in-memory data stores. Build data-intensive apps or improve the performance of your existing apps by retrieving data from high throughput and low latency in-memory data stores.
DynamoDB
Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity.
Regardless the approach you use, a middleware must be deployed along with your app to manage the stored session state.
Middleware: Could be either a thrid-party solution or your own solution.
Resources
AWS Session Management
Amazon ElastiCache
Amazon DynamoDB
Middleware for session management (Google results)

Migrating from AWS RDS to an AWS EC2 running MySQL

Yes, fellow SOrs, I'm doing it backwards. I tried an AWS RDS but the CPU seems to be spiking so often that I need the flexibility of an EC2 to run some fine tuning. I'm not a MySQL expert, so I'm asking:
How can I create a setup on the EC2 so that it reads and replicates my RDS?
Ideally I'd do the switch in real time via DNS but first I need the EC2 to act like a clone of the RDS updating with any new data happening between now and the actual migration period.
Any pointers are much appreciated. Thanks!

Why can't you use mysql-tuner with RDS?
You shouldn't need to run sysbench, since Amazon handles OS level tuning for you on RDS
Aurora is a drop-in replacement for MySQL and will scale better than any MySQL cluster you could setup on EC2
You should be addressing why your Wordpress instance is hammering the database so much instead of trying to optimize the database.
You should put a CDN in front of your Wordpress site and cache as much as you can to reduce the load on both your web server and database server. It looks like there are also solutions out there for using Redis to cache data so that Wordpress doesn't have to constantly go back to MySQL for data.
Amazon provides the CloudFront CDN, but I would also recommend looking into CloudFlare.
Honestly, given your number of concurrent users, unless you have tons of dynamic constantly changing content, you should be able to run your entire site on a t2.micro with CloudFlare in front of it with cache everything enabled.

I'd like to offer an update:
Mark B's input has been extremely valuable as I have discovered that I can run mysql tuner remotely and touch the RDS. Therefore there was no need to migrate after all.
The RDS CPU spikes were due to a large amount of non-INDEX JOINs.
I have added indexes and the results are fantastic:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js