elastic cache redis shard balance not working

elastic cache redis shard balance not working - redisson

We are in stabilization phase of using redisson client with aws redis elastic cache in clustered mode with two shards. Redisson connection endpoint uses the configendpoint recommended by AWS. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Endpoints.html
We seem to see shard 002(other is 001) seems to take most the data(69% vs 001 is 0.5% which is big difference in usage) written while seeing AWS elastic cache metrics. This can be observed in dbmemoryusage percent 001.
We are using this redisson RMapCache
private RMapCache<byte[], SOME JAVA POJO> redisCache;
Not sure if one shard alone is taking all redis slots. Any pointers or documentation on this problem for recommendation would help.

Related

Rds cross region read delays

We have an rds instance in us-east-1, the applications that access the rds are in us-east-1 and us-east-2. the two regions have vpc peering. we are load balancing the request received by using route53 weighted routing policy. we are experiencing 10 ms delay when communicating across regions. currently, these 10 ms delays are acceptable between microservices. but when the applications in region2 are accessing rds, we are facing a considerable delay. (due to the large amount of database calls of hibernate ). Are there any was to reduce this database latency ?

If you have cross-region databases and reading from slave / read-replicas then expect some delays. This is not technology issue but physics problem as data has to physically travel from A to B. The latency will grow as you move these points (A & B) further away from each other.
A solution to reduce this latency is to use cache layer. All the first reads will be from database (cross-region) but subsequent ones can be from Redis cache for e.g.

Optimizing latency between application server (EC2) and RDS

here's how the story goes.
We started transforming a monolith, single-machine, e-commerce application (Apache/PHP) to cloud infrastructure. Obviously, the application and the database (MySQL) were on the same machine.
We decided to move to AWS. And as the first step of transformation, we decided to split the database and application. Hosting application on a c4.xlarge machine. And hosting database to RDS Aurora MySQL on a db.r5.large machine, with default options.
This setup performed well. Especially the database performance went up high.
Unfortunately, when the traffic spiked up, we started experiencing long response times. Looked like RDS, although being really fast for executing queries, wasn't returning results fast enough over the network to the EC2 machine.
So that was our conclusion after an in-depth analysis of the setup including Apache/MySQL/PHP tuning parameters. The delayed response time was definitely due to the network latency between EC2 and RDS/Aurora machine, both machines being in the same region.
Before adding additional resources (ex: ElastiCache etc) we'd first like to look into any default configuration we can play around with to solve this problem.
What do you think we missed there?

One of the bigest strength with the cloud is the scalability and you should always design your application to utilise it and it sounds like your RDS instance is getting chocked due to nr of request more than the process time for the queries. So rather go more small instances with load balancing than one big doing all the job. And with Load Balancers you will get away from a singel point of failure due to you can have replicas of your database and they can even be placed in different AZ.
Here is a blogpost you can read on the topic:
https://aws.amazon.com/blogs/database/scaling-your-amazon-rds-instance-vertically-and-horizontally/
Good luck in your aws journey.

The Best answer to your question is using read replicas, but remember only your read requests could be sent to your read replicas so you would need to design your application that way
Also for some cost savings, you should try aurora serverless
One more option is passing traffic between ec2 and rds through a private network rather than using the public internet to connect your ec2 to rds that can be one of the mistakes that might be happening

Elastic Beanstalk / ELB adding over 60ms latency

I'm running a microservice on AWS Elastic Beanstalk which is logging it's responses internally at 1-4ms, but the AWS Dashboard is showing an average of 68ms (not even counting latency to/from AWS). Is this normal? It just seems odd that EB/ELB would add 60ms of latency to every request.
It's configured to use a Docker container, which seems to use nginx. Although it doesn't seem to be configured to log the ttfb in the access logs, this is auto-configured by Amazon.
In testing I tried both a t2.micro, and a t2.large instance, and that had no effect on the test results. Is there something I can tweak on my end... really need to get this under 10-20ms (not counting rtt/ping distance) for the service to be useful.

It appears to have been a problem on Amazon's side... It was averaging 69ms on Friday, today (Monday morning) it's now 3.9ms

Migrating Redis to AWS Elasticache with minimal downtime

Let's start by listing some facts:
Elasticache can't be a slave of my existing Redis setup. Real shame, that would be so much more efficent.
I have only one Redis server to migrate, with roughly 3gb of data.
Downtime must be less than 10 mins. I assume the usual "stop the site, stop redis, provision cluster with snapshot" will take longer than this.
Similar to this question: How do I set an elasticache redis cluster as a slave?
One idea on how this might work:
Set Redis to use an AOF and trigger BGSAVE at the same time.
When BGSAVE finishes, provision the Elasticache cluster with RDB seed.
Stop the site and shut down my local Redis instance.
Use an aof-replay tool to replay the AOF into Elasticache.
Start the site again, pointed at the Elasticache cluster.
My questions:
How can I guarantee that my AOF file begins at exactly the point the RDB file ends, and that no data will be written in between?
Is there an AOF tool supported by the maintainers of Redis, or are they all third-party solutions, and therefore (potentially) of questionable reliability?*
* No offence intended to any authors of such tools, I'm sure they're great, I just feel much more confident using a tool written by the same team as the product to avoid potential compatibility bugs.

I have only one Redis server to migrate, with roughly 3gb of data
I would halt, save the REDIS to S3 and then upload it to a new cluster.
I'm guessing 10 mins to save the file and get it into s3.
10 minutes to just launch an elasticache cluster from that data.
Leaves you ten extra minutes to configure and test.
But there is a simple way of knowing EXACTLY how long.
Do a test migration of it.
DONT stop your live system
Run BGSAVE and get a dump of your Redis (leave everything running as normal)
move the dump S3
launch an elasticache cluster for it.
Take DETAILED notes, TIME each step, copy the commands to a notepad window.
Put a Word/excel document so you have a migration document. That way you know how long it takes and there are no surprises. Let us know how it goes.

ElastiCache has online migration support. You can use the start-migration API to start migration from self managed cluster to ElastiCache cluster.
aws elasticache start-migration --replication-group-id <ElastiCache Replication Group Id> --customer-node-endpoint-list "Address='<IP Address>',Port=<Port>"
The input to the API is your ElastiCache replication group id and the IP and port of the master of your self managed cluster. You need to ensure that the IP address is accessible from ElastiCache node. (An example IP address would be the private IP address of the master of your self managed cluster). This API will make the master node of the ElastiCache cluster call 'SLAVEOF' on the master of your self managed cluster. This will establish a replication stream and will start migrating data from self-managed cluster to ElastiCache cluster. During migration, the master of the ElastiCache cluster will stop accepting writes sent to it directly. You can start using ElastiCache cluster from your application for reads.
Once you have all your data in ElastiCache cluster, you can use the complete-migration API to stop the migration. This API will stop the replication from self managed cluster to ElastiCache cluster.
aws elasticache complete-migration --replication-group-id <ElastiCache Replication Group Id>
After this, the master of the ElastiCache cluster will start accepting writes. You can start using ElastiCache cluster from your application for both read and write.
The following limitations to be aware of for this migration method:
An existing or newly created ElastiCache deployment should meet the following requirements for migration:
It's cluster-mode disabled using Redis engine version 5.0.5 or higher.
It doesn't have either encryption in-transit or encryption at-rest enabled.
It has Multi-AZ with Auto-Failover enabled.
It has sufficient memory available to fit the data from your Redis on EC2 instance. To configure the right reserved memory settings, see Managing Reserved Memory.

There are a few ways to migrate the data without downtime. They are harder to achieve though.
you could have your app write to two redis instances simultaneously - one of which would be on EC. Once the caches are both 'warm', you could just restart your app, and read from the EC cache.
You could initially migrate to EC2 instead of EC. not really what you were hoping to hear, I imagine. this is easy to do because you can set EC2 as salve of your redis instance. Also, migrating from EC2 to EC is somewhat easier (the data is already on AWS), so there's a benefit for users with huge sets of data.
You could, in theory, intercept the commands from the client and send them to EC, thus effectively "replicating". But this requires some programming ( I dont believe a tool like this exists ATM) and would be hard with multiple, ephemeral clients.

Is Amazon EC Redis an effective caching solution or not?

As you may have noticed Amazon has announced a new feature for its own ElasticCache product, which is supporting Redis.
We are currently using one EC2 instance for our Redis (just queuing for now) and we've decided to use Redis for other upcoming features such as commenting system, discussion, real-time messaging, real-time user tracking and analytics, etc.
We don't mind to run more and bigger EC2 instances, but should we invest in ElasticCache (Redis) and move into it from the beginning now that we haven't started yet or it's too soon to see the results, benchmarks, and downside? Or it's even limited in some prospectives compare to having your own Redis on your own instances?
Update 1:
Let me to be detailed of what we are going to do with Redis. Probably using queuing as we have been doing it by Resque. Not sure if ElasticCache let us do any Pub/Sub but if it does we would like to do that as well. And of course atomic and high-level operations.
Update2:
There is a new video by Senior Product Manger of Amazon Elastic Cache posted a week ago that happened during AWS reInvent Conference. Because it is new he talks about Redis too!
http://www.youtube.com/watch?v=odMmdPBV8hM

I would say that if Redis is an effective caching solution for you, then ElasticCache will work for you - you're simply paying AWS to manage the back end and plumbing for you. Performance may be marginally slower - you have to have a DNS lookup for requests, vs having redis running in a VPC where you can access a private IP address directly - but even accessing it from an EC2 instance should resolve the public DNS name to the internal private IP. And of course you can launch your EC node in your VPC.
There are some complications when running a memcached cluster - you will need to use the amazon client to make sure your code connects to the correct node - but I do not believe as of Dec 2013 that this is needed for redis.
If you're implementing a queue on top of redis, have you looked at SQS to see if it will work for you?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js