elasticache -ERR unknown command 'PSYNC - amazon-web-services

Trying to make AWS-Elasticache Redis3.2 as the Master and the redis instances in my EC2 as slaveof for this elasticache. I get this error.
Connecting to MASTER masterredis.XXXXXXXXXXXXXXXXXXX.amazonaws.com:6379
MASTER <-> SLAVE sync started
Non blocking connect for SYNC fired the event.
Master replied to PING, replication can continue...
Partial resynchronization not possible (no cached master)
Master does not support PSYNC or is in error state (reply: -ERR unknown command 'PSYNC')
Retrying with SYNC...
MASTER aborted replication with an error: ERR unknown command 'SYNC'
....

ElastiCache is a Redis-as-a-Service from AWS. As such, its operator has the liberty to disable certain commands/features - the ability to replicate to an external instance is one of these disabled features and that's the reason for the PSYNC/SYNC errors that you're getting.

Amazon ElastiCache, as noted by Itamar, is a managed service. When using the ElastiCache Redis engine (it also has a memcached option), the interface is 100% open source Redis, but Amazon has done some changes to the underlying code to optimize for the Cloud.
Replication in ElastiCache uses primary nodes and replica nodes. These are similar, but not identical, to the masters and slaves used by Redis Sentinel. Since ElastiCache is always running on AWS EC2, it can use direct memory transfer to make replication faster and failover smoother than the OSS distribution that needs to support a huge number of possible infrastructure stacks. But you cannot mix Sentinel nodes with ElastiCache nodes in the same cluster.
(BTW, I am part of the Managed Databases team at AWS, so feel free to reach out if you'd like more detail, either here on stackoverflow on by email: briskman at amazon dot com).
For documentation, see http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
We don't explicitly state that SYNC/PSYNC are not supported ... but you can find the ElastiCache API reference online at http://docs.aws.amazon.com/AmazonElastiCache/latest/APIReference/Welcome.html . It has details on API calls such as CreateReplicationGroup and such.

Related

How to setup AWS RDS standalone instance without traffic from actual RDS cluster

We need to know what are the best options to set AWS RDS instance (Aurora mysql) that is standalone and does not get traffic from actual RDS cluster.
Requirement is for our data team to write analytical queries but we do not want it to impact actual application and DB performance. Hence we need a DB which always has near to live data but live traffic or application does not connect to this instance.
Need to know which fits better, DL clone OR AWS Pilot light OR AWS Warn standby OR AWS hot standby OR
multi-AZ configuration.
Kindly let us know which one would fit our requirement better.
We have so far read about below 3 options,
AWS Amazon Aurora DB clone, https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Clone.html
AWS Pilot light or AWS Warn standby or AWS hot standby
. https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-iii-pilot- light-and-warm-standby/
With multi-AZ configuration, we can create a new instance in new AZ, so that his instance will have a different host (kind off, a fail over strategy), where traffic to his instance will be from our queries and not from live prod application, unless there is some fail over issue.
Option 1, Aurora cloning says
Run workload-intensive operations, such as exporting data or running analytical queries on the clone.
...which seems to be your use case here.
Just be aware that the clone will not see any changes to the original data after it is made. So you will need to periodically delete and re-clone to get the updated data
Regarding option 2, I wrote those blog posts, and I do not think that approach suits your use case. That approach is for disaster recovery
Option 3 may work. To modify it a bit, the concept here is to create an Aurora Replica, which as you say is a separate instance. The problem here is the reader endpoint for your production workload, it may hit that instance (which is not what you want)
EDIT: Adding new option 4
Option 4. Check out Amazon Aurora zero-ETL integration with Amazon Redshift. This zero-ETL integration also enables you to analyze data from multiple Aurora database clusters in an Amazon Redshift cluster.

Can Kafka Connect be made rack aware so that my connector reads all partitions from one broker?

We have a Kafka cluster in Amazon MSK that has 3 brokers in different availability zones of the same region. I want to set up a Kafka Connect connector that backs up all data from our Kafka brokers to Amazon S3, and I'm trying to do it with MSK Connect.
I set up Confluent's S3 Sink Connector on MSK Connect and it works - everything is uploaded to S3 as expected. The problem is that it costs a fortune in data transfer charges - our AWS bills for MSK nearly double whenever the connector is running, with EU-DataTransfer-Regional-Bytes accounting for the entire increase.
It seems that the connector is pulling messages from all three of our brokers, i.e. from three different AZs, and so we're getting billed for inter-AZ data transfer. This makes sense because by default it will read a partition from that partition's leader, which could be any of the three brokers. But if we were creating a normal consumer, not a connector, it would be possible to restrict the consumer to read all partitions from a specific broker:
"client.rack" : "euw1-az3"
☝️ For a consumer in the euw1-az3 AZ, this setting makes the consumer read all partitions from the local broker, regardless of the partitions' leader - which avoids the need for inter-AZ data transfer and brings the bills down massively.
My question is, is it possible to do something similar for a Kafka Connector? What config setting do I have to pass to the connector, or the worker, to make it only read from one specific broker/AZ? Is this possible with MSK Connect?
Maybe I am missing something about your question. I think you want to have a look at this:
https://docs.confluent.io/platform/current/tutorials/examples/multiregion/docs/multiregion.html
replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
I though it was general knowledge, it applies to any on-premises or cloud deployment.
AWS confirmed to me on a call with support that MSK Connect doesn't currently support rack awareness. I was able to solve my problem by deploying the connector in an EC2 instance (not on MSK Connect) with the connect worker config consumer.client.rack set to the same availability zone that the EC2 instance is running in.

Launch and shutting down instances suited for AWS ECS or Kubernetes?

I am trying to create a certain kind of networking infrastructure, and have been looking at Amazon ECS and Kubernetes. However I am not quite sure if these systems do what I am actually seeking, or if I am contorting them to something else. If I could describe my task at hand, could someone please verify if Amazon ECS or Kubernetes actually will aid me in this effort, and this is the right way to think about it?
What I am trying to do is on-demand single-task processing on an AWS instance. What I mean by this is, I have a resource heavy application which I want to run in the cloud and have process a chunk of data submitted by a user. I want to submit a this data to be processed on the application, have an EC2 instance spin up, process the data, upload the results to S3, and then shutdown the EC2 instance.
I have already put together a functioning solution for this using Simple Queue Service, EC2 and Lambda. But I am wondering would ECS or Kubernetes make this simpler? I have been going through the ECS documenation and it seems like it is not very concerned with starting up and shutting down instances. It seems like it wants to have an instance that is constantly running, then docker images are fed to it as task to run. Can Amazon ECS be configured so if there are no task running it automatically shuts down all instances?
Also I am not understanding how exactly I would submit a specific chunk of data to be processed. It seems like "Tasks" as defined in Amazon ECS really correspond to a single Docker container, not so much what kind of data that Docker container will process. Is that correct? So would I still need to feed the data-to-be-processed into the instances via simple queue service, or other? Then use Lambda to poll those queues to see if they should submit tasks to ECS?
This is my naive understanding of this right now, if anyone could help me understand the things I've described better, or point me to better ways of thinking about this it would be appreciated.
This is a complex subject and many details for a good answer depend on the exact requirements of your domain / system. So the following information is based on the very high level description you gave.
A lot of the features of ECS, kubernetes etc. are geared towards allowing a distributed application that acts as a single service and is horizontally scalable, upgradeable and maintanable. This means it helps with unifying service interfacing, load balancing, service reliability, zero-downtime-maintenance, scaling the number of worker nodes up/down based on demand (or other metrics), etc.
The following describes a high level idea for a solution for your use case with kubernetes (which is a bit more versatile than AWS ECS).
So for your use case you could set up a kubernetes cluster that runs a distributed event queue, for example an Apache Pulsar cluster, as well as an application cluster that is being sent queue events for processing. Your application cluster size could scale automatically with the number of unprocessed events in the queue (custom pod autoscaler). The cluster infrastructure would be configured to scale automatically based on the number of scheduled pods (pods reserve capacity on the infrastructure).
You would have to make sure your application can run in a stateless form in a container.
The main benefit I see over your current solution would be cloud provider independence as well as some general benefits from running a containerized system: 1. not having to worry about the exact setup of your EC2-Instances in terms of operating system dependencies of your workload. 2. being able to address the processing application as a single service. 3. Potentially increased reliability, for example in case of errors.
Regarding your exact questions:
Can Amazon ECS be configured so if there are no task running it
automatically shuts down all instances?
The keyword here is autoscaling. Note that there are two levels of scaling: 1. Infrastructure scaling (number of EC2 instances) and application service scaling (number of application containers/tasks deployed). ECS infrastructure scaling works based on EC2 autoscaling groups. For more info see this link . For application service scaling and serverless ECS (Fargate) see this link.
Also I am not understanding how exactly I would submit a specific
chunk of data to be processed. It seems like "Tasks" as defined in
Amazon ECS really correspond to a single Docker container, not so much
what kind of data that Docker container will process. Is that correct?
A "Task Definition" in ECS is describing how one or multiple docker containers can be deployed for a purpose and what its environment / limits should be. A task is a single instance that is run in a "Service" which itself can deploy a single or multiple tasks. Similar concepts are Pod and Service/Deployment in kubernetes.
So would I still need to feed the data-to-be-processed into the
instances via simple queue service, or other? Then use Lambda to poll
those queues to see if they should submit tasks to ECS?
A queue is always helpful in decoupling the service requests from processing and to make sure you don't lose requests. It is not required if your application service cluster can offer a service interface and process incoming requests directly in a reliable fashion. But if your application cluster has to scale up/down frequently that may impact its ability to reliably process.

Migrating Redis to AWS Elasticache with minimal downtime

Let's start by listing some facts:
Elasticache can't be a slave of my existing Redis setup. Real shame, that would be so much more efficent.
I have only one Redis server to migrate, with roughly 3gb of data.
Downtime must be less than 10 mins. I assume the usual "stop the site, stop redis, provision cluster with snapshot" will take longer than this.
Similar to this question: How do I set an elasticache redis cluster as a slave?
One idea on how this might work:
Set Redis to use an AOF and trigger BGSAVE at the same time.
When BGSAVE finishes, provision the Elasticache cluster with RDB seed.
Stop the site and shut down my local Redis instance.
Use an aof-replay tool to replay the AOF into Elasticache.
Start the site again, pointed at the Elasticache cluster.
My questions:
How can I guarantee that my AOF file begins at exactly the point the RDB file ends, and that no data will be written in between?
Is there an AOF tool supported by the maintainers of Redis, or are they all third-party solutions, and therefore (potentially) of questionable reliability?*
* No offence intended to any authors of such tools, I'm sure they're great, I just feel much more confident using a tool written by the same team as the product to avoid potential compatibility bugs.
I have only one Redis server to migrate, with roughly 3gb of data
I would halt, save the REDIS to S3 and then upload it to a new cluster.
I'm guessing 10 mins to save the file and get it into s3.
10 minutes to just launch an elasticache cluster from that data.
Leaves you ten extra minutes to configure and test.
But there is a simple way of knowing EXACTLY how long.
Do a test migration of it.
DONT stop your live system
Run BGSAVE and get a dump of your Redis (leave everything running as normal)
move the dump S3
launch an elasticache cluster for it.
Take DETAILED notes, TIME each step, copy the commands to a notepad window.
Put a Word/excel document so you have a migration document. That way you know how long it takes and there are no surprises. Let us know how it goes.
ElastiCache has online migration support. You can use the start-migration API to start migration from self managed cluster to ElastiCache cluster.
aws elasticache start-migration --replication-group-id <ElastiCache Replication Group Id> --customer-node-endpoint-list "Address='<IP Address>',Port=<Port>"
The input to the API is your ElastiCache replication group id and the IP and port of the master of your self managed cluster. You need to ensure that the IP address is accessible from ElastiCache node. (An example IP address would be the private IP address of the master of your self managed cluster). This API will make the master node of the ElastiCache cluster call 'SLAVEOF' on the master of your self managed cluster. This will establish a replication stream and will start migrating data from self-managed cluster to ElastiCache cluster. During migration, the master of the ElastiCache cluster will stop accepting writes sent to it directly. You can start using ElastiCache cluster from your application for reads.
Once you have all your data in ElastiCache cluster, you can use the complete-migration API to stop the migration. This API will stop the replication from self managed cluster to ElastiCache cluster.
aws elasticache complete-migration --replication-group-id <ElastiCache Replication Group Id>
After this, the master of the ElastiCache cluster will start accepting writes. You can start using ElastiCache cluster from your application for both read and write.
The following limitations to be aware of for this migration method:
An existing or newly created ElastiCache deployment should meet the following requirements for migration:
It's cluster-mode disabled using Redis engine version 5.0.5 or higher.
It doesn't have either encryption in-transit or encryption at-rest enabled.
It has Multi-AZ with Auto-Failover enabled.
It has sufficient memory available to fit the data from your Redis on EC2 instance. To configure the right reserved memory settings, see Managing Reserved Memory.
There are a few ways to migrate the data without downtime. They are harder to achieve though.
you could have your app write to two redis instances simultaneously - one of which would be on EC. Once the caches are both 'warm', you could just restart your app, and read from the EC cache.
You could initially migrate to EC2 instead of EC. not really what you were hoping to hear, I imagine. this is easy to do because you can set EC2 as salve of your redis instance. Also, migrating from EC2 to EC is somewhat easier (the data is already on AWS), so there's a benefit for users with huge sets of data.
You could, in theory, intercept the commands from the client and send them to EC, thus effectively "replicating". But this requires some programming ( I dont believe a tool like this exists ATM) and would be hard with multiple, ephemeral clients.

Is Amazon EC Redis an effective caching solution or not?

As you may have noticed Amazon has announced a new feature for its own ElasticCache product, which is supporting Redis.
We are currently using one EC2 instance for our Redis (just queuing for now) and we've decided to use Redis for other upcoming features such as commenting system, discussion, real-time messaging, real-time user tracking and analytics, etc.
We don't mind to run more and bigger EC2 instances, but should we invest in ElasticCache (Redis) and move into it from the beginning now that we haven't started yet or it's too soon to see the results, benchmarks, and downside? Or it's even limited in some prospectives compare to having your own Redis on your own instances?
Update 1:
Let me to be detailed of what we are going to do with Redis. Probably using queuing as we have been doing it by Resque. Not sure if ElasticCache let us do any Pub/Sub but if it does we would like to do that as well. And of course atomic and high-level operations.
Update2:
There is a new video by Senior Product Manger of Amazon Elastic Cache posted a week ago that happened during AWS reInvent Conference. Because it is new he talks about Redis too!
http://www.youtube.com/watch?v=odMmdPBV8hM
I would say that if Redis is an effective caching solution for you, then ElasticCache will work for you - you're simply paying AWS to manage the back end and plumbing for you. Performance may be marginally slower - you have to have a DNS lookup for requests, vs having redis running in a VPC where you can access a private IP address directly - but even accessing it from an EC2 instance should resolve the public DNS name to the internal private IP. And of course you can launch your EC node in your VPC.
There are some complications when running a memcached cluster - you will need to use the amazon client to make sure your code connects to the correct node - but I do not believe as of Dec 2013 that this is needed for redis.
If you're implementing a queue on top of redis, have you looked at SQS to see if it will work for you?