I have currently a cluster HA (with three multiple masters, one for every AZ) deployed on AWS through kops. Kops deploys a K8S cluster with a pod for etcd-events and a pod for etcd-server on every master node. Every one of this pods uses a mounted volume.
All works well, for example when a master dies, the autoscaling group creates another master node in the same AZ, that recovers its volume and joins itself to the cluster. The problem that I have is respect to a disaster, a failure of an AZ.
What happens if an AZ should have problems? I periodically take volume EBS snapshots, but if I create a new volume from a snapshot (with the right tags to be discovered and attached to the new instance) the new instance mounts the new volumes, but after that, it isn't able to join with the old cluster. My plan was to create a lambda function that was triggered by a CloudWatch event that creates a new master instance in one of the two safe AZ with the volume mounted from a snapshot of the old EBS volume. But this plan has errors because it seems that I am ignoring something about Raft, Etcd, and their behavior. (I say that because I have errors from the other master nodes, and the new node isn't able to join itself to the cluster).
Suggestions?
How do you recover theoretically the situation of a single AZ disaster and the situation when all the master died? I have the EBS snapshots. Is it sufficient to use them?
I'm not sure how exactly you are restoring the failed node but technically the first thing that you want to recover is your etcd node because that's where all the Kubernetes state is stored.
Since your cluster is up and running you don't need to restore from scratch, you just need to remove the old node and add the new node to etcd. You can find out more on how to do it here. You don't really need to restore any old volume to this node since it will sync up with the other existing nodes.
Then after this, you can start other services as kube-apiserver, kube-controller-manager, etc.
Having said that, if you keep the same IP address and the exact same physical configs you should be able to recover without removing the etcd node and adding a new one.
Related
well I have couple of questions. I have a aurora cluster with a single MySQL RDS instance which has 450GB of data. we use this cluster only when we are doing some specific testing.so I want to delete this cluster but keep its data available to me so I can make a new cluster whenever we need any testing to be done.
there are couple of ways this can be done as far as I know
take a snapshot of the cluster and restore the cluster from the
snapshot whenever required.
backup the cluster to s3 and restore the
cluster from s3 when required
which way is more faster and which one is more cost efficient?
can an entire cluster be restored from s3 if so what are the steps involved ? , I found the aws documentation bit too messy.
If we stop a aurora cluster, it again automatically restarts within 7 days , is there a way to prevent this automatic restart and keep it stopped when it is not required and start when required ?
We have configured Kubernetes cluster on EC2 machines in our AWS account using kops tool (https://github.com/kubernetes/kops) and based on AWS posts (https://aws.amazon.com/blogs/compute/kubernetes-clusters-aws-kops/) as well as other resources.
We want to setup a K8s cluster of master and slaves such that:
It will automatically resize (both masters as well as nodes/slaves) based on system load.
Runs in Multi-AZ mode i.e. at least one master and one slave in every AZ (availability zone) in the same region for e.g. us-east-1a, us-east-1b, us-east-1c and so on.
We tried to configure the cluster in the following ways to achieve the above.
Created K8s cluster on AWS EC2 machines using kops this below configuration: node count=3, master count=3, zones=us-east-1c, us-east-1b, us-east-1a. We observed that a K8s cluster was created with 3 Master & 3 Slave Nodes. Each of the master and slave server was in each of the 3 AZ’s.
Then we tried to resize the Nodes/slaves in the cluster using (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-master.yaml). We set the node_asg_min to 3 and node_asg_max to 5. When we increased the workload on the slaves such that auto scale policy was triggered, we saw that additional (after the default 3 created during setup) slave nodes were spawned, and they did join the cluster in various AZ’s. This worked as expected. There is no question here.
We also wanted to set up the cluster such that the number of masters increases based on system load. Is there some way to achieve this? We tried a couple of approaches and results are shared below:
A) We were not sure if the cluster-auto scaler helps here, but nevertheless tried to resize the Masters in the cluster using (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-run-on-master.yaml). This is useful while creating a new cluster but was not useful to resize the number of masters in an existing cluster. We did not find a parameter to specify node_asg_min, node_asg_max for Master the way it is present for slave Nodes. Is there some way to achieve this?
B) We increased the count MIN from 1 to 3 in ASG (auto-scaling group), associated with one the three IG (instance group) for each master. We found that new instances were created. However, they did not join the master cluster. Is there some way to achieve this?
Could you please point us to steps, resources on how to do this correctly so that we could configure the number of masters to automatically resize based on system load and is in Multi-AZ mode?
Kind regards,
Shashi
There is no need to scale Master nodes.
Master components provide the cluster’s control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controller’s ‘replicas’ field is unsatisfied).
Master components can be run on any machine in the cluster. However, for simplicity, set up scripts typically start all master components on the same machine, and do not run user containers on this machine. See Building High-Availability Clusters for an example multi-master-VM setup.
Master node consists of the following components:
kube-apiserver
Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.
etcd
Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.
kube-scheduler
Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on.
kube-controller-manager
Component on the master that runs controllers.
cloud-controller-manager
runs controllers that interact with the underlying cloud providers. The cloud-controller-manager binary is an alpha feature introduced in Kubernetes release 1.6.
For more detailed explanation please read the Kubernetes Components docs.
Also if You are thinking about HA, you can read about Creating Highly Available Clusters with kubeadm
I think your assumption is that similar to kubernetes nodes, masters devide the work between eachother. That is not the case, because the main tasks of masters is to have consensus between each other. This is done with etcd which is a distributed key value store. The problem maintaining such a store is easy for 1 machine but gets harder the more machines you add.
The advantage of adding masters is being able to survive more master failures at the cost of having to make all masters fatter (more CPU/RAM....) so that they perform well enough.
I have a Kubernetes cluster distributed on AWS via Kops consisting of 3 master nodes, each in a different AZ. As is well known, Kops realizes the deployment of a cluster where Etcd is executed on each master node through two pods, each of which mounts an EBS volume for saving the state. If you lose the volumes of 2 of the 3 masters, you automatically lose consensus among the masters.
Is there a way to use information about the only master who still has the status of the cluster, and retrieve the Quorum between the three masters on that state? I recreated this scenario, but the cluster becomes unavailable, and I can no longer access the Etcd pods of any of the 3 masters, because those pods fail with an error. Moreover, Etcd itself becomes read-only and it is impossible to add or remove members of the cluster, to try to perform manual interventions.
Tips? Thanks to all of you
This is documented here. There's also another guide here
You basically have to backup your cluster and create a brand new one.
I have a VPC which has 2 Private subnets i.e. subnet 1 and subnet 2. My redshift cluster sits in subnet 2 and has data. I want to move the redshift from subnet 2 to subnet 1 within the same VPC (Which can be done easily). But I have few doubts related to data migration:
Does data migration happens automatically without any data loss or do I need to take the backup, create the cluster in subnet 1 and then again push the backed up data to the cluster.
Any leads would be appreciated.
From Amazon Redshift Snapshots - Amazon Redshift:
Restoring a Cluster from a Snapshot
A snapshot contains data from any databases that are running on your cluster, and also information about your cluster, including the number of nodes, node type, and master user name. If you need to restore your cluster from a snapshot, Amazon Redshift uses the cluster information to create a new cluster and then restores all the databases from the snapshot data. The new cluster that Amazon Redshift creates from the snapshot will have same configuration, including the number and type of nodes, as the original cluster from which the snapshot was taken. The cluster is restored in the same region and a random, system-chosen Availability Zone, unless you specify another Availability Zone in your request.
So, you should take a Snapshot of the existing Redshift cluster, then create a new cluster in the other AZ by Restoring a Cluster from the Snapshot. Once everything seems to be okay, you can delete the old cluster.
I know this is an old question but looking at the comments in the other answer I ran into the same issue. To move from one subnet to another you do the following:
Make sure your subnet group has both subnets in it
Take a snapshot of the current cluster
Restore the snapshot to a new cluster using the same settings as the original cluster but select the other AZ that you want it to be in. Name the cluster something like "{ORIGINAL_NAME}-new"
After the new cluster is up, go to the old cluster, edit it, and append "-old" to the name (it can be anything but just to keep them straight).
After the rename is complete go back to the new cluster, edit it, and change the name to be exactly what the original name was.
After the cluster is up make sure you can connect to it correctly and then delete the original cluster (the "-old" one).
CloudFormation hooks to the cluster by the cluster name. So as long as the settings are identical to the original one then there should be no drift in the CF stack and it will be linked to the new cluster after the names are changed.
We have 4 standalone, non-multiAZ aurora DB instances in a VPC and we want to move them to aurora instances in another VPC.
As I understand there are 3 ways to migrate DB instances:
1) Modify the DB instance's Subnet group to change the VPC.
However this is not supported for Aurora instances yet.
2) Create a read replica, and when slave catches up, stop the slave and take it's snapshot to create a DB instance in the different VPC and use external slave then to again resume the replication.
I have a few question around this second method. As Aurora is using some different
replication method, the result of show slave status; command is empty.
Also by default the binlog_format is OFF so not sure if I have to modify it and
then restart the instance to take note of the binlog etc.
Have anyone done this before and can guide me? I don't want to restart the instance
to later find out that it is not working, as it is a very critical DB and I want to
minimize the downtime.
3) Use Amazon DMS service, however I cannot find the source DB details for Aurora in the documentation here.
I need to find out what all permissions to give to the replication user
I'll create for this. This command in aurora is not working,
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON . TO 'replication_user'#'%'
IDENTIFIED BY 'aaaaaa';
Any help would be appreciated.
You should be able to restore from a snapshot. I migrated aurora across VPCs using this approach. I hoped we could create a read-replica in a different VPC but at the moment I think you can only create one in a different region.
One strange side effect I have found and i'm not sure why this happens is that in the orginal cluster multi-az was 2 Zones and in the restored cluster multi-az is No. This hasn't affected things that I can see. I still have a cluster with writer and reader
Create a snapshot and restore it to a new cluster. When you do that, use a new subnet group created for the second VPC. That's the best way to achieve this. Like you called out, you cannot change subnet group for an existing cluster.
In your approach #2, you mention having to create a slave and then taking a snapshot. That's not required. All instances in a cluster are connected to the same shared volume, so you can just go ahead and create a snapshot from your single instance cluster directly. Just make a note that snapshots are a cluster level action, and not an instance level action in Aurora.
From https://aws.amazon.com/premiumsupport/knowledge-center/rds-vpc-aurora-cluster/
It states that
Create a clone in a different VPC
If you clone a database in an Aurora cluster, you can change the VPC of the clone. > However, the subnets in the VPC must map to the same set of Availability Zones. For more information, see Cloning Databases in an Aurora DB Cluster.
It does work though can't see a way via the console to break the replication. We are using this as a faster way of migrating than snapshot and restore. Stopping the original master would break replication but though you could do it cleaner