Create an image from a volume snapshot using terraform - google-cloud-platform

We're using actions-runner-controller to deploy self-hosted runners as pods in GKE for github actions. Everything is specified in terraform including the GKE cluster that backs the runners.
The pods are managed by stateful sets so that we can mount a volume snapshot, which is a cache of stuff that jobs might need, e.g. common actions, cli tools etc.
We’re hitting rate limits in relation to snapshot creation (RESOURCE_OPERATION_RATE_EXCEEDED). It seems that we’re attempting to create a snapshot for each CI job that runs, whereas we should be creating an image from a snapshot and creating a persistent volume from that.
However, it's not clear to me how this can be done in terraform.
I'm not sure if it's relevant but we're using Node Auto Provisioning to provision the nodes on which the pods run.
Any guidance would be appreciated...

Related

Restoring an AWS Aurora cluster from a snapshot using the same cluster identifier using CDK

Scenario:
An AWS Aurora (Postgres) cluster is currently running, but something is wrong with it.
To solve this issue, I want to restore a cluster snapshot created some days earlier and reset to that point in time.
I use CDK for my infrastructure and restore the cluster using the DatabaseClusterFromSnapshot class.
Problem: the old cluster and new cluster must carry the exact same cluster identifier. Because AWS deletes the old cluster only after the new cluster is created, this results in a naming conflict.
Is there any feasible way to reach my goal of having the new cluster carrying the exact same name? On our production environment, there is deletion protection so I cannot create a custom resource to delete the cluster before the new one is created.
Manually rename my_cluster to my_cluster_old. Follow this AWS guide (see caveats about read replicas). The commands for clusters may be somewhat different (e.g., modify-db-cluster).
Restore the snapshot to my_cluster.
Delete my_cluster_old.

How to update AWS ECS cluster instances with Terraform?

I have an existing ECS cluster (EC2) created using Terraform. I want to install some software on those EC2 instances using Terraform. One of our business requirements is that we will not be able to destroy and re-create the instances and we have to do it on existing instances.
How should I approach this?
It sounds like your organization is experimenting with running its services in docker and ECS. I also assume you are using AWS ECR to host your docker images (although technically doesn't matter).
When you create an ECS cluster it is initially empty. If you were to re-run your terraform template again it should show you that there are no updates to apply. In order to take the next step you will need to define a ecs-service and a ecs-task-definition. This can either be done in your existing terraform template, in a brand new template, or you can do it manually (aws web console or awscli). Since you are already using terraform I would assume you would continue to use it. Personally I would keep everything in 1 template but again it is up to you.
An ecs-service is essentially the runtime configuration for your ecs-tasks
An ecs-task-definition is a set of docker containers to run. In the simplest case it is 1 single docker container. Here is where you will specify the docker image(s) you will use, how much CPU+RAM for the docker container, etc...
In order for your running ecs service(s) to be updated without your EC2 nodes ever going down you would just need to update your docker image within the ecs-task-definition portion of your terraform template (an ofcourse run terraform).
with all this background info now you can add a Terraform ecs-service Terraform ecs-task-definition to your terraform template.
Since you did not provide your template I cannot say exactly how this should be setup but an example terraform template of a complete ECS cluster running nginx can be found below
Complete Terraform ECS example
more examples can be found at
Official terraform ECS github examples
You could run a provisioner attached to an always triggered null_resource to always run some process against things but I'd strongly recommend you rethink your processes.
Your ECS cluster should be considered completely ephemeral, as with the containers running on them. When you want to update the ECS instances then destroying and replacing the instances (ideally in an autoscaling group) is what you want to do as it greatly simplifies things. You can read more about the benefits of immutable infrastructure elsewhere.
If you absolutely couldn't do this then you'd most likely be best off using another tool, such as Ansible, entirely. You could choose to launch this via Terraform using a null_resource provisioner as mentioned above which would look something like the following:
resource "null_resource" "on_demand_provisioning" {
triggers {
always = "${uuid()}"
}
provisioner "local-exec" {
command = "ansible-playbook -i inventory.yml playbook.yml --ssh-common-args='-o StrictHostKeyChecking=no'"
}
}

how to deploy code on multiple instances Amazon EC2 Autocaling group?

So we are launching an ecommerce store built on magento. We are looking to deploy it on Amazon EC2 instance using RDS as database service and using amazon auto-scaling and elastic load balancer to scale the application when needed.
What I don't understand is this:
I have installed and configured my production magento enviorment on an EC2 instance (database is in RDS). This much is working fine. But now when I want to dynamically scale the number of instances
how will I deploy the code on the dynamically generated instances each time?
Will aws copy the whole instance assign it a new ip and spawn it as a
new instance or will I have to write some code to automate this
process?
Plus will it not be an overhead to pull code from git and deploy every time a new instance is spawned?
A detailed explanation or direction towards some resources on the topic will be greatly appreciated.
You do this in the AutoScalingGroup Launch Configuration. There is a UserData section in the LaunchConfiguration in CloudFormation where you would write a script that is ran when ever the ASG scales up and deploys a new instance.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-launchconfig.html#cfn-as-launchconfig-userdata
This is the same as the UserData section in an EC2 Instance. You can use LifeCycle hooks that will tell the ASG not to put the EC2 instance into load until everything you want to have configured it set up.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-as-lifecyclehook.html
I linked all CloudFormation pages, but you may be using some other CI/CD tool for deploying your infrastructure, but hopefully that gets you started.
To start, do check AWS CloudFormation. You will be creating templates to design how the infrastructure of your application works ~ infrastructure as code. With these templates in place, you can rollout an update to your infrastructure by pushing changes to your templates and/or to your application code.
In my current project, we have a github repository dedicated for these infrastructure templates and a separate repository for our application code. Create a pipeline for creating AWS resources that would rollout an updated to AWS every time you push to the repository on a specific branch.
Create an infrastructure pipeline
have your first stage of the pipeline to trigger build whenever there's code changes to your infrastructure templates. See AWS CodePipeline and also see AWS CodeBuild. These aren't the only AWS resources you'll be needing but those are probably the main ones, of course aside from this being done in cloudformation template as mentioned earlier.
how will I deploy the code on the dynamically generated instances each time?
Check how containers work, it would be better and will greatly supplement on your learning on how launching new version of application work. To begin, see docker, but feel free to check any resources at your disposal
Continuation with my current project: We do have a separate pipeline dedicated for our application, but will also get triggered after our infrastructure pipeline update. Our application pipeline is designed to build a new version of our application via AWS Codebuild, this will create an image that will become a container ~ from the docker documentation.
we have two triggers or two sources that will trigger an update rollout to our application pipeline, one is when there's changes to infrastructure pipeline and it successfully built and second when there's code changes on our github repository connected via AWS CodeBuild.
Check AWS AutoScaling , this areas covers the dynamic launching of new instances, shutting down instances when needed, replacing unhealthy instances when needed. See also AWS CloudWatch, you can design criteria with it to trigger scaling down/up and/or in/out.
Will aws copy the whole instance assign it a new ip and spawn it as a new instance or will I have to write some code to automate this process?
See AWS ElasticLoadBalancing and also check out more on AWS AutoScaling. On the automation process, if ever you'll push through with CloudFormation, instance and/or containers(depending on your design) will be managed gracefully.
Plus will it not be an overhead to pull code from git and deploy every time a new instance is spawned?
As mentioned, earlier having a pipeline for rolling out new versions of your application via CodeBuild, this will create an image with the new code changes and when everything is ready, it will be deployed ~ becomes a container. The old EC2 instance or the old container( depending on how you want your application be deployed) will be gracefully shut down after a new version of your application is up and running. This will give you zero downtime.

Autoscaling a running Hadoop cluster setup on AWS EC2

My goal is to understand how can I auto-scale a Hadoop cluster on AWS EC2.
I am exploring AWS offerings from elastic scaling perspective for a Hadoop as service (EMR) and Hadoop on EC2.
For EMR, I gathered that using CloudWatch, performance metrics can be monitored and the user can be alerted once they reach the set threshold, thereafter the cluster can be scaled up or down depending on its utilization state.
This approach would require some custom implementation to automate the steps.(correct me if I am missing anything here)
For Hadoop on EC2, I came across with the auto scaling option which can add or remove instances as per configured scaling policies.
But I am not clear how a newly added node would get bootstrapped to the cluster automatically? How would YARN know that it can spawn a new container on this newly added node?
Does auto-scaling work for master-slave kind of setup as well or is limited to the web application?
There is 'Qubole' offering services to manage Hadoop on AWS as well....should that be used for automatically managing scaling the cluster?

Kubernetes multi-master cluster on AWS

We have created a single-master three-node worker cluster on AWS using Terraform, user-data YAML files, and CoreOS AMIs. The cluster works as expected but we are now in need to scale the master's up from one to three for redundancy purposes. My question is: other than using etcd clustering and/or the information provided on http://kubernetes.io/docs/admin/high-availability/, do we have any options to deploy a new or scale-up the existing cluster with multi-master nodes? Let me know if more details are required to answer this question.
The kops project can set up a high-availability master for you when creating a cluster.
Pass the following when you create the cluster (replacing the zones with whatever is relevant to you):
--master-zones=us-east-1b,us-east-1c,us-east-1d
Additionally, it can export Terraform files if you want to continue to use Terraform.