rebuilding a terrafrom aws instnace with persisted data - amazon-web-services

I have a terraform script where I launch a cluster of ec2 instances and combine them together (specifically for Influx Db). Here is the relevant part of the script:
resource "aws_instance" "influxdata" {
count = "${var.ec2-count-influx-data}"
ami = "${module.amis.rhel73_id}"
instance_type = "${var.ec2-type-influx-data}"
vpc_security_group_ids = ["${var.sg-ids}"]
subnet_id = "${element(module.infra.subnet,count.index)}"
key_name = "${var.KeyName}"
tags {
Name = "influx-data-node"
System = "${module.infra.System}"
Environment = "${module.infra.Environment}"
OwnerContact = "${module.infra.OwnerContact}"
Owner = "${var.Owner}"
}
ebs_block_device {
device_name = "/dev/sdg"
volume_size = 750
volume_type = "io1"
iops = 3000
encrypted = true
delete_on_termination = false
}
user_data = "${file("terraform/attach_ebs.sh")}"
connection {
//private_key = "${file("/Users/key_CD.pem")}" #dev env
//private_key = "${file("/Users/influx.pem")}" #qa env west
private_key = "${file("/Users/influx_east.pem")}" #qa env east
user = "ec2-user"
}
provisioner "remote-exec" {
inline = ["echo just checking for ssh. ttyl. bye."]
}
}
What I'm now trying to do...is taint one instance and then have terraform rebuild it but...what I want it to do is to unmount ebs, detach ebs, rebuild instance, attach ebs, mount ebs.
When I do a terraform taint module=instance it does taint it and then when I go to apply the change, it creates a whole new instance and new ebs volume instead of reattaching the previous one back on the new instance.
I'm also ok with some data loss as this is part of a cluster so when the node gets rebuilt...it should just sync up with the other nodes....
How can one do this with Terraform?

Create a snapshot of the instance you want to taint. Then change the ec2 resource's ebs volume to have the snapshot ID param of the previous snapshot. https://www.terraform.io/docs/providers/aws/r/instance.html#snapshot_id

Since you are not concerned about data lose, I would want to think that having an ec2 instance that is in same state as before the rebuild but without the data is OK.
if this is the case, I will use user-data to automatically mount the newly attached volume after the rebuild. in that case after the rebuild, the ec2 instance will be in same state (with volume formatted and attached) and ready to sync with other nodes in the cluster for data without any additional effort. The script should look as below:
#!/bin/bash
DEVICE_FS=`blkid -o value -s TYPE /dev/xvdh`
if [ "`echo -n $DEVICE_FS`" == "" ] ; then
mkfs.ext4 /dev/xvdh
fi
mkdir -p /data
echo '/dev/xvdh /data ext4 defaults 0 0' >> /etc/fstab
mount /data

Related

How can I have a Terraform output become a permanent value in a userdata script?

I'm not sure what the best way to do this is - but I want to deploy EFS and an ASG + Launch Template with Terraform. I'd like my userdata script (in my launch template) to run commands to mount to EFS
For example:
sudo mount -t efs -o tls fs-0b28edbb9efe91c25:/ efs
My issue is: I need my userdata script to receive my EFS ID, however, this can't just happen on my initial deploy, I also need this to happen whenever I perform a rolling update. I want to be able to change the AMI ID in my launch template, which will perform a rolling update when I run terraform apply and need my EFS ID to be in my userdata script to run the command to mount EFS.
Is there a way to have a terraform output get permanently added to my Userdata script? What are other alternatives for making this happen? Would it involve Cloudformation or other AWS services?
main.tf
resource "aws_vpc" "mtc_vpc" {
cidr_block = "10.123.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "dev"
}
}
resource "aws_launch_template" "foobar" {
name_prefix = "LTTest"
image_id = "ami-017c001a88dd93847"
instance_type = "t2.micro"
update_default_version = true
key_name = "lttest"
user_data = base64encode(templatefile("${path.module}/userdata.sh", {efs_id = aws_efs_file_system.foo.id}))
iam_instance_profile {
name = aws_iam_instance_profile.test_profile.name
}
vpc_security_group_ids = [aws_security_group.mtc_sg.id]
}
resource "aws_autoscaling_group" "bar" {
desired_capacity = 2
max_size = 2
min_size = 2
vpc_zone_identifier = [
aws_subnet.mtc_public_subnet1.id
]
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
launch_template {
id = "${aws_launch_template.foobar.id}"
version = aws_launch_template.foobar.latest_version
}
}
resource "aws_efs_file_system" "foo" {
creation_token = "jira-efs"
}
resource "aws_efs_mount_target" "alpha" {
file_system_id = aws_efs_file_system.foo.id
subnet_id = aws_subnet.mtc_public_subnet1.id
security_groups = [aws_security_group.mtc_sg.id]
}
Update:
User-data Script:
#!/usr/bin/env bash
sudo yum install -y amazon-efs-utils
sudo yum install -y git
cd /home/ec2-user
mkdir efs
sudo mount -t efs -o tls ${efs_id}:/ efs
There are a few ways to do this. A couple that come to mind are:
Provide the EFS ID to the user data script using the templatefile() function.
Give your EC2 instance permissions (via IAM) to use the EFS API to search for the ID.
The first option is probably the most practical.
First, define your EFS filesystem (and associated aws_efs_mount_target and aws_efs_access_point resources, but I'll omit those here):
resource "aws_efs_file_system" "efs" {}
Now you can define the user data with the templatefile() function:
resource "aws_launch_template" "foo" {
# ... all the attributes ...
user_data = base64encode(templatefile("${path.module}/user-data.sh.tpl", {
efs_id = aws_efs_file_system.efs.id # Use dns_name or id here
}))
}
The contents of user-data.sh.tpl can have all your set up steps, including the filesystem mount:
sudo mount -t efs -o tls ${efs_id}:/ efs
When Terraform renders the user data in the launch template, it will substitute the variable.

Rebuild existing EC2 instance from snapshot?

I have an existing linux EC2 instance with a corrupted root volume. I have a snapshot of the root that is not corrupted. Is it possible with terraform to rebuild the instance based on the snapshot ID of the snapshot ?
Of course it is possible, this simple configuration should do the job:
resource "aws_ami" "aws_ami_name" {
name = "aws_ami_name"
virtualization_type = "hvm"
root_device_name = "/dev/sda1"
ebs_block_device {
snapshot_id = "snapshot_ID”
device_name = "/dev/sda1"
volume_type = "gp2"
}
}
resource "aws_instance" "ec2_name" {
ami = "${aws_ami.aws_ami_name.id}"
instance_type = "t3.large"
}
It's not really a Terraform-type task, since you're not deploying new infrastructure.
Instead, do it manually:
Create a new EBS Volume from the Snapshot
Stop the instance
Detach the existing root volume (make a note of the device identifier such as /dev/sda1)
Attach the new Volume with the same identifier
Start the instance

How to create a temporary instance for a custom AMI creation in AWS with terraform?

Im trying to create a Custom AMI for my AWS Deployment with terraform. Its working quite good also its possible to run a bash script. Problem is it's not possible to create the instance temporary and then to terminate the ec2 instance with terraform and all the depending resources.
First im building an "aws_instance" than I provide a bash script in my /tmp folder and let this be done via ssh connection in the terraform script. Looking like the following:
Fist the aws_instance is created based on a standard AWS Amazon Machine Image (AMI). This is used to later create an image from it.
resource "aws_instance" "custom_ami_image" {
tags = { Name = "custom_ami_image" }
ami = var.ami_id //base custom ami id
subnet_id = var.subnet_id
vpc_security_group_ids = [var.security_group_id]
iam_instance_profile = "ec2-instance-profile"
instance_type = "t2.micro"
ebs_block_device {
//...further configurations
}
Now a bash script is provided. The source is the location of the bash script on the local linux box you are executing terraform from. The destination is on the new AWS instance. In the file I install further stuff like python3, oracle drivers and so on...
provisioner "file" {
source = "../bash_file"
destination = "/tmp/bash_file"
}
Then I'll change the permissions on the bash script and execute it with a ssh-user:
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/bash_file",
"sudo /tmp/bash_file",
]
}
No you can login to the ssh-user with the previous created key.
connection {
type = "ssh"
user = "ssh-user"
password = ""
private_key = file("${var.key_name}.pem")
host = self.private_ip
}
}
With the aws_ami_from_instance the ami can be modelled with the current created EC2 instance. And now is callable for further deployments, its also possible to share it in to further aws accounts.
resource "aws_ami_from_instance" "custom_ami_image {
name = "acustom_ami_image"
source_instance_id = aws_instance.custom_ami_image.id
}
Its working fine, but what bothers me is the resulting ec2 instance! Its running and its not possible to terminate it with terraform? Does anyone have an idea how I can handle this? Sure, the running costs are manageable, but I don't like creating datagarbage....
The best way to create AMI images i think is using Packer, also from Hashicorp like Terraform.
What is Packer?
Provision Infrastructure with Packer Packer is HashiCorp's open-source tool for creating machine images from source
configuration. You can configure Packer images with an operating
system and software for your specific use-case.
Packer creates an temporary instance with temporary keypair, security_group and IAM roles. In the provisioner "shell" are custom inline commands possible. Afterwards you can use this ami with your terraform code.
A sample script could look like this:
packer {
required_plugins {
amazon = {
version = ">= 0.0.2"
source = "github.com/hashicorp/amazon"
}
}
}
source "amazon-ebs" "linux" {
# AMI Settings
ami_name = "ami-oracle-python3"
instance_type = "t2.micro"
source_ami = "ami-xxxxxxxx"
ssh_username = "ec2-user"
associate_public_ip_address = false
ami_virtualization_type = "hvm"
subnet_id = "subnet-xxxxxx"
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 8
volume_type = "gp2"
delete_on_termination = true
encrypted = false
}
# Profile Settings
profile = "xxxxxx"
region = "eu-central-1"
}
build {
sources = [
"source.amazon-ebs.linux"
]
provisioner "shell" {
inline = [
"export no_proxy=localhost"
]
}
}
You can find documentation about packer here.

Terraform: wait till the instance is "reachable"

I have some Terraform code with an aws_instance and a null_resource:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform - it works as expected.
Question: How can I run local-exec only when the instance is running and accepting an SSH connection?
The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There's a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.
One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "remote-exec" {
connection {
host = aws_instance.example.public_dns
user = "centos"
file = file("files/id_rsa")
}
inline = ["echo 'connected!'"]
}
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
This will first attempt to connect to the instance's public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.
Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance's user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:
#!/bin/bash
while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
echo -e "\033[1;36mWaiting for cloud-init..."
sleep 1
done
There is an ansible specific solution for this problem. Add this code to you playbook(there is all so pre_task clause if you use roles)
- name: will wait till reachable
hosts: all
gather_facts: no # important
tasks:
- name: Wait for system to become reachable
wait_for_connection:
- name: Gather facts for the first time
setup:
For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here's some sample code:
instanceId=$1
echo "Waiting for instance to bootstrap ..."
tries=0
responseCode=1
while [[ $responseCode != 0 && $tries -le 10 ]]
do
echo "Try # $tries"
cmdId=$(aws ssm send-command --document-name AWS-RunShellScript --instance-ids $instanceId --parameters commands="cat /tmp/job-done.txt # or some other validation logic" --query Command.CommandId --output text)
sleep 5
responseCode=$(aws ssm get-command-invocation --command-id $cmdId --instance-id $instanceId --query ResponseCode --output text)
echo "ResponseCode: $responseCode"
if [ $responseCode != 0 ]; then
echo "Sleeping ..."
sleep 60
fi
(( tries++ ))
done
echo "Wait time over. ResponseCode: $responseCode"
Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.
resource "null_resource" "wait_for_instance" {
depends_on = [
aws_instance.my_instance
]
triggers = {
always_run = "${timestamp()}"
}
provisioner "local-exec" {
command = "${path.module}/scripts/check-instance-state.sh ${aws_instance.my_instance.id}"
}
}

How to remove an AWS Instance volume using Terraform

I deploy a CentOS 7 using an AMI that automatically creates a volume on AWS, so when I remove the platform using the next Terraform commands:
terraform plan -destroy -var-file terraform.tfvars -out terraform.tfplan
terraform apply terraform.tfplan
The volume doesn't remove because it was created automatically with the AMI and terraform doesn't create it. Is it possible to remove with terraform?
My AWS instance is created with the next terraform code:
resource "aws_instance" "DCOS-master1" {
ami = "${var.aws_centos_ami}"
availability_zone = "eu-west-1b"
instance_type = "t2.medium"
key_name = "${var.aws_key_name}"
security_groups = ["${aws_security_group.bastion.id}"]
associate_public_ip_address = true
private_ip = "10.0.0.11"
source_dest_check = false
subnet_id = "${aws_subnet.eu-west-1b-public.id}"
tags {
Name = "master1"
}
}
I add the next code to get information about the EBS volume and to take its ID:
data "aws_ebs_volume" "ebs_volume" {
most_recent = true
filter {
name = "attachment.instance-id"
values = ["${aws_instance.DCOS-master1.id}"]
}
}
output "ebs_volume_id" {
value = "${data.aws_ebs_volume.ebs_volume.id}"
}
Then having the EBS volume ID I import to the terraform plan using:
terraform import aws_ebs_volume.data volume-ID
Finally when I run terraform destroy all the instances and volumes are destroyed.
if the EBS is protected you need to manually remove the termination protection first on the console then you can destroy it