How do I "hibernate" an entire AWS account? - amazon-web-services

I have inherited a nasty project running on AWS. The architecture is very complex and, not being an AWS expert, I'm not entirely sure where to start unpicking the structure of the project.
I've been asked to "hibernate" the entire system - meaning we can't lose any data, but all running instances can be switched off. As long as it can eventually be resurrected, the data can be stored anywhere within AWS.
From what I can tell, I think everything is controlled by ECS. There are several ECS clusters with various tasks running, and most seem to have a volume attached. I know that if I set the desired instances for any cluster to 0, it will shut down all associated instances. But the question is, if I later set that back to the previous count, will the data come back when the instances do? Or are the volumes deleted once the EC2 instances are terminated (as is usual with a "standalone" EC2 instance).
I'd prefer not to have to set up the entire architecture again manually in the future.
I have tried to understand whether volumes currently in use by the instances for the ECS clusters will be deleted when I reduce the desired instance count to 0, but have been unable to come up with an answer.

Related

AWS EC2 mutiple Instances with redundant data

So I am noob, and just started learning about cloud computing and how AWS works.
Aws provide us the EC2 as service, Where I can run VM and put my data on top of it or say run my web server on top of the newly created instance.
I have seen people creating multiple instances in the same AZ.
Doesn't that leads to Redudant data, I mean we are creating more EC2 instances in the same AZ and putting the same data on each insances, so that when one gets called off, the client can access the data from another instance.
My question is this the industry practice to keep the redudant data (same data) across all the instances for better reachability or we are not putting the whole data on other instances just a fraction of it.
Please don't mind my stupid question, I am just learning.
Usually, when you run several instances of the same application, you run them in autoscaling group (ASG). For this, your applications should be stateless, as instances in ASG can be launched and terminated at any time automatically. To protect from data lose and ensure that new instances have access to existing data files, you don't store any user-data (e.g. user uploaded images) on the instances.
Instead you are going to store the data files outside of your instances. Common choices for that are S3 and EFS. This solves the data redundancy issue, as you only have one copy of your files which can be accessed from all the instances. Also it protects your data from being lost if your instances will get terminated, as S3 and EFS are highly-available and fault resistant data stores managed by AWS.

Automated setup for multi-server RethinkDB cluster via an ECS service

I'm attempting to set up a RethinkDB cluster with 3 servers total spread evenly across 3 private subnets, each in different AZ's in a single region.
Ideally, I'd like to deploy the DB software via ECS and provision the EC2 instances with auto scaling, but I'm having trouble trying to figure out how to instruct the RethinkDB instances to join a RethinkDB cluster.
To create/join a cluster in RethinkDB, when you start up a new instance of RethinkDB, you specify host:port combination of one of the other machines in the cluster. This is where I'm running into problems. The Auto Scaling service is creating new primary ENI's for my EC2 instances and using a random IP in my subnet's range, so I can't know the IP of the EC2 instance ahead of time. On top of that, I'm using awsvpc task networking, so ECS is creating new secondary ENI's dedicated to each docker container and attaching them to the instances when it deploys them and those are also getting new IP's, which I don't know ahead of time.
So far I've worked out one possible solution, which is to not use an autoscaling group, but instead to manually deploy 3 EC2 instances across the private subnets, which would let me assign my own, predetermined, private IP. As I understand it, this still doesn't help me if I'm using awsvpc task networking though because each container running on my instances will get its own dedicated secondary ENI and I wont know the IP of that secondary ENI ahead of time. I think I can switch my task networking to bridge mode, to get around this. That way I can use the predetermined IP of the EC2 instances (the primary ENI) in the RethinkDB join command.
So In conclusion, the only way to achieve this, that I can figure out, is to not use Auto Scaling, or awsvpc task networking, both of which would otherwise be very desirable features. Can anyone think of a better way to do this?
As mentioned in the comments, this is more of an issue around the fact you need to start a single RethinkDB instance one time to bootstrap the cluster and then handle discovery of the existing cluster members when joining new members to the cluster.
I would have thought RethinkDB would have published a good pattern in their docs for this because it's going to be pretty common when setting up clusters but I couldn't see anything useful in their docs. If someone does know of an official recommendation then you should definitely use this rather than what I'm about to propose especially as I have no experience with running RethinkDB.
This is more just spit-balling here and will be completely untested (at least for now) but the principle is going to be you need to start a single, one off instance of RethinkDB to bootstrap the cluster, then have more cluster members join and then ditch the special case bootstrap member that didn't attempt to join a cluster and leave the remaining cluster members to work.
The bootstrap instance is easy enough to consider. You just need a RethinkDB container image and an ECS task that just runs it in stand-alone mode with the ECS service only running one instance of the task. To enable the second set of cluster members to easily discover cluster members including this bootstrapping instance it's probably easiest to use a service discovery mechanism such as the one offered by ECS which uses Route53 records under the covers. The ECS service should register the service in the RethinkDB namespace.
Then you should create another ECS service that's basically the same as the first but in an entrypoint script should list the services in the RethinkDB namespace and then resolve them, discarding the container's own IP address and then uses the discovered host to join to with --join when starting RethinkDB in the container.
I'd then set the non bootstrap ECS service to just 1 task at first to allow it to discover the bootstrap version and then you should be able to keep adding tasks to the service one at a time until you're happy with the size of the non bootstrapped cluster leaving you with n + 1 instances in the cluster including the original bootstrap instance.
After that I'd remove the bootstrap ECS service entirely.
If an ECS task dies in the non bootstrap ECS service dies for whatever reason it should be able to auto rejoin without any issue as it will just find a running RethinkDB task and start that.
You could probably expand the checks for which cluster member to join to by checking that the RethinkDB port is open and running before using that as a member to join so it will handle multiple tasks being started at the same time (with my original suggestion it could potentially find another task that is looking to join the cluster and try to join to that first, with them all potentially deadlocking if they all failed to randomly pick the existing cluster members by chance).
As mentioned, this answer comes with a big caveat that I haven't got any experience running RethinkDB and I've only played with the service discovery mechanism that was recently released for ECS so might be missing something here but the general principles should hold fine.

ECS stop instance

I've an ECS cluster with running one task for my backend instance. I would like to be able to stop/start the EC2 instance whenever I want. Is it possible?? I was trying to stop instance directly but it terminates after few second when stopped and after that new instance is created automatically. I tried to change the Auto Scale Group to match desired=min=0 capacity but when I do that the instance gets auto terminated. I just want to turn off the Ec2 instance when its not needed to be used but at the same time I want data to persist betweet turning on and off. I am fighting with this for a few days now and wasn't able to achieve my goals.
Also how to link EBS volume with VOLUME /root/.local/share/XYZ from docker file image to persist the data from the XYZ folder
I would suggest you to do modifications in autoscaling group, when you want to turn off instance put 0 in auto scaling and when you want to turn on change value in autoscaling,
... you can do that with aws cli , and you can shcdule the period also by putting aws cli command in cron job
I would suggest using EFS. Here is an article from AWS on how to persist data from ECS containers using EFS.
Using Amazon EFS to Persist Data from Amazon ECS Containers
Start/Stop instances and auto-scale don't really fit together.
Auto-scale is specifically designed to solve scalein/scaleout.
One way to address this could be using customized termination policy (but I never tried this in ECS setup).
One note though, if your customized termination policy never terminates the instances and you continue adding instances to keep always, you might get good amount EC2 bill.

Do EC2 instances randomly start/stop?

I am trying to wrap my head around EC2 instances, and I am having a bit of an issue. I heard from a friend of mine that Amazon will kill EC2 instances, and then they restart the image (thus losing all state). Unless it uses EBS as a backing store, you get no persistence.
But I have been looking into Xen and it seems like instances should easily migrate instead of being killed/restarted.
So, do Amazon EC2 instances randomly stop/start an image with all state being managed by something external like EBS?
Amazon EC2 instances will not be stopped/started/restarted unless you issue a command to do so.
In some situations (eg hardware maintenance), you might receive a request from Amazon asking you to stop & start your instance (which moves it to a different host). Such requests are typically issued with two weeks notice.
One AWS customer told me that their instance had been running continuously for over three years.
Yes it is quite possible that an EC2 instance dies and is replaced. Depending upon your data, you may need to use EBS, EFS or S3 to prevent data loss in such cases.

Google cloud instance group VM's keep getting reset back to original image

For some reason my instance group VM's keep getting reset back to the original image. i.e after I've installed and configured software everything gets whiped out. Additionally, in some occasions their IP's also change so I have to go and edit my Cloud SQl instance to allow network connections. Anyone seen this behavior before?
It sounds like you're using Managed Instance Groups, which are designed to work with stateless workloads. MIGs will scale their size up and down, if you have Autoscaler enabled, and scaling down will delete instances. The health checking feature can also destroy and recreate instances.
If you need extra software installed on MIG instances, you need to create a single VM the way you want, and then create a Snapshot of that VM's disk (and then an Image from the Snapshot). The Instance Template creates fresh instances from that Image file every time.
Even if you recreate your image the way you want with all software installed, MIGs will still create and destroy instances assuming there is nothing value on any of them. And yes, their IPs could change too, because new instances are being created.