Move production env cassandra cluster to AWS cassandra without downtime

Move production env cassandra cluster to AWS cassandra without downtime - amazon-web-services

I have cassandra cluster of 4 nodes running in production environment in on-premise DC. I have to move it to AWS cassandra. I don't want move cassandra to dynamoDB due to some reason.
Cassandra version used is pretty old i.e. 1.2.9.
How do I move cassandra from on-premise DC to AWS cassandra without data loss and zero downtime.
Regards,
Vivek

Create a new DC in AWS. Configure inter DC sync between the both DCs. Decommision the old DC.
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

I've done this before.
As Alex Tbk said, you'll add nodes at AWS with a new data center.
Add new, empty nodes with a new, logical data center name. You'll need to use the GossipingPropertyFile snitch (if you're not already) and specify the DC in the cassandra-rackdc.properties file. You can also specify a logical rack in that file, and it's usually a good idea to put the AWS availability zone there.
After you get one AWS node built, build the rest using the first node's IP as a seed. You won't want them trying to hit your on-prem DC on a restart. And afterward, you will also want to set the first node to use one of the others as a seed node.
Once you get your nodes built, you'll need to modify your keyspace and specify a replication factor for your new AWS DC.
Run a nodetool rebuild on each AWS node, using your existing DC as the source.
nodetool rebuild -- sourceDCName
Definitely consider upgrading. 1.2 was a solid version, but you're missing out on so many new features/fixes.
Note: Some folks recommend using the AWS specific snitches (EC2Snitch, EC2MultiRegionSnitch), but you will want all nodes in your cluster running on the same snitch. So for a hybrid-cloud deployment (before you have a chance to decomm your on-premise nodes), you'll want to stick with the GossipingPropertyFile snitch. Honestly, that's the only snitch I use, regardless of provider, and you should be fine with that, too.

Related

How can I begin taking snapshots of an already running ElasticSearch cluster with no prior repository setup?

I've recently started working on ELK Stack in my organisation, and there's a requirement that has got me wondering.
The cluster details are as follows:
Hosted on AWS EC2 instances
No repository has been registered for backups
A curator is up an running, but not yet being utilized
Using instance store
During my research, I learnt that the best way to backup is by using the Snapshot API method, but the problem is that it requires registering repositories(such as S3), and a node restart.
I've been told that a restart will cause all the data in that node to be lost. Is this true? If not, what would be the best way to go around to begin automated backups without any loss of data, if it's possible?
Thank you.

You need to restart the node to install the repository-s3 plugin (instructions here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/repository-s3.html). However, restarting a node doesn't mean destroying the EC2 instance with all the attached volumes. In other words, your data will outlive the node restart. Here how to stop, here how to start again.

70 TB Cassandra migration to AWS

We have a 70TB cluster which has around 200 Keyspaces, and planning to move this to AWS. Few approaches which we are thinking
Replace the node in one of cluster with a Node in AWS, and do that one by one for all nodes
Create a new Cluster in AWS, Bulk Copy each key space and do the dual write to both the clusters, and cutover during downtime.
Any other better ways to do this? Could we use the AWS as a new DC and change one keyspace at a time?

Yes, for a live migration you can use a hybrid cloud model and create a new DC in AWS. This is probably the best approach if you want to migrate data without downtime and you can do this keyspace-by-keyspace to manage the I/O streaming.
This blog article by Alain Rodriguez on Cassandra Data Center Switch provides a walk though of how to do this in great detail.
Using AWS Snowball is a faster and cheaper approach if downtime is an option.

You can use AWS as a new cluster. But you need to be careful. Not all cassandra sstable can talk each other, so you need to verify the compatibility between sstables. Another issue is that you can cause some high load in your "old" cluster.
So i high recomend that you start with this parameters very low to test the powerful of your cluster and the AWS cluster:
compaction_throughput_mb_per_sec (Default 16)
stream_throughput_outbound_megabits_per_sec (Default 200)
Bootstrap a new AWS node inside your actual cluster is not a good idea, because you will tell to cassandra redistributed the keys between the cluster each time that you bootstrap a new node and you will stay without "plan b" if anything wrong.
Another good solution is make a separeted cluster(without connect them) in AWS and move the data with SPARK. Just move the data without transform is very simple and you are "on the control" of the process .

Security patches for Kubernetes Nodes

I have access to a kops-built kubernetes cluster on AWS EC2 instances. I would like to make sure, that all available security patches from the corresponding package manager are applied. Unfortunately searching the whole internet for hours I am unable to find any clue on how this should be done. Taking a look into the user data of the launch configurations I did not find a line for the package manager - Therefor I am not sure if a simple node restart will do the trick and I also want to make sure that new nodes come up with current packages.
How to make security patches on upcoming nodes of a kubernetes cluster and how to make sure that all nodes are and stay up-to-date?

You might want to explore https://github.com/weaveworks/kured
Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.
Watches for the presence of a reboot sentinel e.g. /var/run/reboot-required
Utilises a lock in the API server to ensure only one node reboots at a time
Optionally defers reboots in the presence of active Prometheus alerts or selected pods
Cordons & drains worker nodes before reboot, uncordoning them after

How Does Container Optimized OS Handle Security Updates?

If there is a security patch for Google's Container Optimized OS itself, how does the update get applied?
Google's information on the subject is vague
https://cloud.google.com/container-optimized-os/docs/concepts/security#automatic_updates
Google claims the updates are automatic, but how?
Do I have to set a config option to update automatically?
Does the node need to have access to the internet, where is the update coming from? Or is Google Cloud smart enough to let Container Optimized OS update itself when it is in a private VPC?

Do I have to set a config option to update automatically?
The automatic update behavior for Compute Engine (GCE) Container-Optimized OS (COS) VMs (i.e. those instances you created directly from GCE) are controlled via the "cos-update-strategy" GCE metadata. See the documentation at here.
The current documented default behavior is: "If not set all updates from the current channel are automatically downloaded and installed."
The download will happen in the background, and the update will take effect when the VM reboots.
Does the node need to have access to the internet, where is the update coming from? Or is Google Cloud smart enough to let Container Optimized OS update itself when it is in a private VPC?
Yes, the VM needs to access to the internet. If you disabled all egress network traffic, COS VMs won't be able to update itself.

When operated as part of Kubernetes Engine, the auto-upgrade functionality of Container Optimized OS (cos) is disabled. Updates to cos are applied by upgrading the image version of the nodes using the GKE upgrade functionality – upgrade the master, followed by the node pool, or use the GKE auto-upgrade features.
The guidance on upgrading a Kubernetes Engine cluster describes the upgrade process used for manual and automatic upgrades: https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster.
In summary, the following process is followed:
Nodes have scheduling disabled (so they will not be considered for scheduling new pods admitted to the cluster).
Pods assigned to the node under upgrade are drained. They may be recreated elsewhere if attached to a replication controller or equivalent manager which reschedules a replacement, and there is cluster capacity to schedule the replacement on another node.
The node's Computer Engine instance is upgraded with the new cos image, using the same name.
The node is started, re-added to the cluster, and scheduling is re-enabled. (Besides some conditions, most pods will not automatically move back.)
This process is repeated for subsequent nodes in the cluster.
When you run an upgrade, Kubernetes Engine stops scheduling, drains, and deletes all of the cluster's nodes and their Pods one at a time. Replacement nodes are recreated with the same name as their predecessors. Each node must be recreated successfully for the upgrade to complete. When the new nodes register with the master, Kubernetes Engine marks the nodes as schedulable.

Solr data migration between AWS EC2 instances

I am planning to set up a Solr server on a EC2 instance. As traffic grows I might have move the solr server from a smaller instance to a bigger one. But this change will need to happen in realtime when the old solr instance serves traffic. So I am concerned that while doing this switch, some valuable data that might been indexed could get lost. Also the data from old server will need to be moved to the new server. There would be a significant time required to do this.
Also when the traffic cannot be handled by the largest server, SolrCloud will need to be deployed on multiple servers and the same data migration issue could occur.
Is there an efficient and a more robust way to do this?

you could probably:
DO start using SolrCloud from the get go, but just with a single node/one shard. At this point there is nothing 'Cloud-dy' here but not harm done either.
When traffic grows, you can create the new bigger EC2 instance, and add it to cluster. Now you have a 'working' SolrCloud cluster with a replica.
As needed, keep adding nodes, and creating more shards/replicas.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js