How to format and mount an ephemeral disk with Terraform? - amazon-web-services

I'm in the process of writing Packer and Terraform code to create an immutable infra on aws. However, it does not seem very straightforward to install ext4 on a disk and mount it.
The steps seem simple:
Creating the ami with packer on t2.micro that contains all software, to be used first on test and afterwards on production.
Launch a r3.4xlarge instance from this ami that has a 300GB ephemeral disk. Format this disk as ext4, mount it and redirect /var/lib/docker to the new filesystem for performance reasons.
Complete the rest of the application launching.
First of all:
Is it best practice to create the ami with the same instance type you will use it for or to have one 'generic' image and start multipe instance types from that?
What philosophy is the best?
packer(software versions) -> terraform(instance + mount disk) -> deploy?
packer(software versions) -> packer(instancetype specific mounts) -> terraform(instance) -> deploy?
packer(software versions, instance specific mounts) -> terraform -> deploy?
The latter is starting to look better and better but requires an ami per instance type.
What I have tried so far:
According to this answer it is better to use the user_data way of working instead of the provisioners way. So I'm going down that road.
This answer seemed promising but is so old it does not work anymore. I could update it but there might be a different, better way.
This answer also seemed promising but was complaining about the ${DEVICE}. I am wondering where that variable is coming from as there are no vars specified in the template_file. If I set my own DEVICE variable to xvdb then it runs, but does not produce a result because xvdb is visible in lsblk but not in blkid.
Here is my code. The format_disks.sh file is the same as the one mentioned above. Any help is greatly appreciated.
# Create a new instance of the latest Ubuntu 16.04 on an
# t2.micro node with an AWS Tag naming it "test1"
provider "aws" {
region = "us-east-1"
}
data "template_file" "format-disks" {
template = "${file("format_disk.sh")}"
vars {
DEVICE = "xvdb"
}
}
resource "aws_instance" "test1" {
ami = "ami-98181234"
instance_type = "r3.4xlarge"
key_name = "keypair-1" # This needs to be changed so multiple users can use this
subnet_id = "subnet-a0aeb123" # maps to the vpc for the us production
associate_public_ip_address = "true"
vpc_security_group_ids = ["sg-f3e91234"] #backendservers
user_data = "${data.template_file.format-disks.rendered}"
tags {
Name = "test1"
}
ephemeral_block_device {
device_name = "xvdb"
virtual_name = "ephemeral0"
}
}

Let me give you my thoughts about this topic.
I think the cloud-init is the key to AWS because you can create the machine you want dynamically.
First, try to change some global script, will be used when your machine is starting. Then, you should add that script as user data I suggest you play with ec2 autoscaling at the same time, so, if you change the cloud-init script, you may terminate the instance, another one will be created automatically.
My structure directories.
.
|____main.tf
|____templates
| |____cloud-init.tpl
main.tf
provider "aws" {
region = "us-east-1"
}
data "template_file" "cloud_init" {
template = file("${path.module}/templates/cloud-init.tpl")
}
data "aws_ami" "linux_ami" {
most_recent = "true"
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-2.0.????????.?-x86_64-gp2"]
}
}
resource "aws_instance" "test1" {
ami = data.aws_ami.linux_ami.image_id
instance_type = "r3.4xlarge"
key_name = "keypair-1"
subnet_id = "subnet-xxxxxx"
associate_public_ip_address = true
vpc_security_group_ids = ["sg-xxxxxxx"]
user_data = data.template_file.cloud_init.rendered
root_block_device {
delete_on_termination = true
encrypted = true
volume_size = 10
volume_type = "gp2"
}
ebs_block_device {
device_name = "ebs-block-device-name"
delete_on_termination = true
encrypted = true
volume_size = 10
volume_type = "gp2"
}
network_interface {
device_index = 0
network_interface_id = var.network_interface_id
delete_on_termination = true
}
tags = {
Name = "test1"
costCenter = "xxxxx"
owner = "xxxxx"
}
}
templates/cloud-init.tpl
#!/bin/bash -x
yum update -y
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
systemctl enable amazon-ssm-agent
systemctl start amazon-ssm-agent
pip install aws-ssm-tunnel-agent
echo "[INFO] SSM agent has been installed!"
# More scripts here.
Would you like to have a temporal disk attached? Have you tried to add a root_block_device with delete_on_termination with a true as value? This way after destroying the aws ec2 instance resource, the disk will be deleted. It's a good way to save costs on AWS but be carefull, Just use it if the data stored on isn't important or if you've backed up.
If you need to attach an external ebs disk on this instance, you can use the AWS API, make sure you have the machine in the same AZ that the disk you can use it.
Let me know if you need some bash script but this is straightforward to do.

Related

using Terraform to pass a file to newly created ec2 instance without sharing the private key in "connection" section

My setup is:
Terraform --> AWS ec2
using Terraform to create the ec2 instance with SSH access.
The
resource "aws_instance" "inst1" {
instance_type = "t2.micro"
ami = data.aws_ami.ubuntu.id
key_name = "aws_key"
subnet_id = ...
user_data = file("./deploy/templates/user-data.sh")
vpc_security_group_ids = [
... ,
]
provisioner "file" {
source = "./deploy/templates/ec2-caller.sh"
destination = "/home/ubuntu/ec2-caller.sh"
}
provisioner "remote-exec" {
inline = [
"chmod +x /home/ubuntu/ec2-caller.sh",
]
}
connection {
type = "ssh"
host = self.public_ip
user = "ubuntu"
private_key = file("./keys/aws_key_enc")
timeout = "4m"
}
}
the above bit works and i could see the provisioner copying and executing the 'ec2-caller.sh.
I don't want to pass my private key in clear text to the Terraform provisioner. Is there anyway we can copy files to the newly created ec2 without using provisioner or without passing the private key on to the provisioner?
Cheers.
The Terraform documentation section Provisioners are a Last Resort raises the need to provision and pass in credentials as one of the justifications for provisioners being a "last resort", and then goes on to suggest some other strategies for passing data into virtual machines and other compute resources.
You seem to already be using user_data to specify some other script to run, so to follow the advice in that document would require combining these all together into a single cloud-init configuration. (I'm assuming that your AMI has cloud-init installed because that's what's typically responsible for interpreting user_data as a shell script to execute.)
Cloud-init supports several different user_data formats, with the primary one being cloud-init's own YAML configuration file format, "Cloud Config". You can also use a multipart MIME message to pack together multiple different user_data payloads into a single user_data body, as long as the combined size of the payload fits within EC2's upper limit for user_data size, which is 16kiB.
From your configuration it seems like you have two steps you'd need cloud-init to deal with in order to fully solve this problem with cloud-init:
Run the ./deploy/templates/user-data.sh script.
Place the /home/ubuntu/ec2-caller.sh on disk with suitable permissions.
Assuming that these two steps are independent of one another, you can send cloud-init a multipart MIME message which includes both the user-data script you were originally using alone and a Cloud Config YAML configuration to place the ec2-caller.sh file on disk. The Terraform provider hashicorp/cloudinit has a data source cloudinit_config which knows how to construct multipart MIME messages for cloud-init, which you could use like this:
data "cloudinit_config" "example" {
part {
content_type = "text/x-shellscript"
content = file("${path.root}/deploy/templates/user-data.sh")
}
part {
content_type = "text/cloud-config"
content = yamlencode({
write_files = [
{
encoding = "b64"
content = filebase64("${path.root}/deploy/templates/ec2-caller.sh")
path = "/home/ubuntu/ec2-caller.sh"
owner = "ubuntu:ubuntu"
permissions = "0755"
},
]
})
}
}
resource "aws_instance" "inst1" {
instance_type = "t2.micro"
ami = data.aws_ami.ubuntu.id
key_name = "aws_key"
subnet_id = ...
user_data = data.cloudinit_config.example.rendered
vpc_security_group_ids = [
... ,
]
}
The second part block above includes YAML based on the cloud-init example Writing out arbitrary files, which you could refer to in order to learn what other settings are possible. Terraform's yamlencode function doesn't have a way to generate the special !!binary tag used in some of the files in that example, but setting encoding: b64 allows passing the base64-encoded text as just a normal string.

Dinamically add resources in Terraform

I set up a jenkins pipeline that launches terraform to create a new EC2 instance in our VPC and register it to our private hosted zone on R53 (which is created at the same time) at every run.
I also managed to save the state into S3 so it doesn't fail with the hosted zone being re-created.
the main issue I have is that at every run terraform keeps replacing the previous instance with the new one and not adding it to the pool of instances.
How can avoid this?
here's a snippet of my code
terraform {
backend "s3" {
bucket = "<redacted>"
key = "<redacted>/terraform.tfstate"
region = "eu-west-1"
}
}
provider "aws" {
region = "${var.region}"
}
data "aws_ami" "image" {
# limit search criteria for performance
most_recent = "${var.ami_filter_most_recent}"
name_regex = "${var.ami_filter_name_regex}"
owners = ["${var.ami_filter_name_owners}"]
# filter on tag purpose
filter {
name = "tag:purpose"
values = ["${var.ami_filter_purpose}"]
}
# filter on tag os
filter {
name = "tag:os"
values = ["${var.ami_filter_os}"]
}
}
resource "aws_instance" "server" {
# use extracted ami from image data source
ami = data.aws_ami.image.id
availability_zone = data.aws_subnet.most_available.availability_zone
subnet_id = data.aws_subnet.most_available.id
instance_type = "${var.instance_type}"
vpc_security_group_ids = ["${var.security_group}"]
user_data = "${var.user_data}"
iam_instance_profile = "${var.iam_instance_profile}"
root_block_device {
volume_size = "${var.root_disk_size}"
}
ebs_block_device {
device_name = "${var.extra_disk_device_name}"
volume_size = "${var.extra_disk_size}"
}
tags = {
Name = "${local.available_name}"
}
}
resource "aws_route53_zone" "private" {
name = var.hosted_zone_name
vpc {
vpc_id = var.vpc_id
}
}
resource "aws_route53_record" "record" {
zone_id = aws_route53_zone.private.zone_id
name = "${local.available_name}.${var.hosted_zone_name}"
type = "A"
ttl = "300"
records = [aws_instance.server.private_ip]
depends_on = [
aws_route53_zone.private
]
}
the outcome is that my previously created instance is destroyed and a new one is created. what I want is to keep adding instances with this code.
thank you
Your code creates only one instance aws_instance.server, and any change to its properties will modify that one instance only as your backend is in S3, thus it acts as a global state for each pipeline. The same goes for aws_route53_record.record and anything else in your script.
If you want different pipelines to reuse the same exact script, you should either use different workspaces, or create different TF states for each pipeline. The other alternative is to redefine your TF script to take a map of instances as an input variable and use for_each to create different instances.
If those instances should be same, you should manage their count using using aws_autoscaling_group and desired capacity.

Terraform detach and re-attach ebs volume

I am using Terraform to spin up EC2's. After EC2 is created, I write some data to /myapp. How do I detach /myapp and re-attach it every time when EC2 gets destroyed and recreated again? I did some research and found following code may be the option:
resource "aws_instance" "my_ec2" {
ami = "${var.ami_id}"
instance_type = "${var.instance_type}"
count = "${var.node_count}"
subnet_id = "${var.subnet_id}"
key_name = "${var.key_pair}"
root_block_device = {
volume_type = "gp2"
volume_size = 20
delete_on_termination = false
}
vpc_security_group_ids = ["${var.security_group_ids}"]
}
resource "aws_ebs_volume" "my_vol" {
size = 120
count = "${var.node_count}"
type = "gp2"
}
resource "aws_volume_attachment" "my_vol_att" {
device_name = "/dev/xvdf"
volume_id = "${element(aws_ebs_volume.my_vol.*.id, count.index)}"
instance_id = "${element(aws_instance.my_ec2.*.id, count.index)}"
count = "${var.node_count}"
}
My questions are:
If my_ec2 get destroyed:
ec2 is gone
my_vol is gone?
my_vol_att stays? If so, where can I see it?
When I run above Terraform code again to re-create ec2 after it gets destroyed:
will it create new my_vol id?
what will happen to
my_vol_att?
What exactly is my_vol_att? A pointer or a copy of m_vol that is attached to device and never gets destroyed unless manually delete it?
Sorry, my questions might sound silly as I am very new to both Terraform and AWS.
I believe this happens when the user_data changes. For example you want to say install jenkins on an ebs volume which is persistent
resource "aws_ebs_volume" "jenkins-data" {
availability_zone = local.jenkins-az-location
size = 20
type = "gp2"
tags = {
Name = "jenkins-data"
}
}
resource "aws_volume_attachment" "jenkins-data-attachment" {
device_name = var.INSTANCE_DEVICE_NAME
volume_id = aws_ebs_volume.jenkins-data.id
instance_id = aws_instance.jenkins-instance.id
skip_destroy = true
}
then your user_data
#!/bin/bash
# volume setup
vgchange -ay
DEVICE_FS=`blkid -o value -s TYPE ${DEVICE}`
if [ "`echo -n $DEVICE_FS`" == "" ] ; then
# wait for the device to be attached
DEVICENAME=`echo "${DEVICE}" | awk -F '/' '{print $3}'`
DEVICEEXISTS=''
while [[ -z $DEVICEEXISTS ]]; do
echo "checking $DEVICENAME"
DEVICEEXISTS=`lsblk |grep "$DEVICENAME" |wc -l`
if [[ $DEVICEEXISTS != "1" ]]; then
sleep 15
fi
done
pvcreate ${DEVICE}
vgcreate data ${DEVICE}
lvcreate --name volume1 -l 100%FREE data
mkfs.ext4 /dev/data/volume1
fi
mkdir -p /var/lib/jenkins
echo '/dev/data/volume1 /var/lib/jenkins ext4 defaults 0 0' >> /etc/fstab
mount /var/lib/jenkins
now down the track you want to change the user data for whatever reason and you run terraform again and it will destroy the instance and build another instance and your drive along with it.
To be honest I never looked into this I just take a snapshot of the volume every time I need to do some invasive surgery and it the volume gets destroyed I re-create it from the snapshot then push the changes back into the terraform state file, can be a pain but generally in my case Jenkins is not completely blown away often.
Once I get a change to look into this more I'll probably update this but at the moment I'm sure there is a option in TF to not destroy assets via api or something.
If I capture the sentiment of your questions correctly, you would like to be able to destroy the EC2 instance independently of the EBS volume.
This is possible, but requires some dynamic resolution via data resources.
What I would do here is create a Terraform module that encapsulates the code you have written above, but replaces the aws_ebs_volume resource with a data source like so:
my_module/main.tf
resource "aws_instance" "my_ec2" {
ami = "${var.ami_id}"
instance_type = "${var.instance_type}"
count = "${var.node_count}"
subnet_id = "${var.subnet_id}"
key_name = "${var.key_pair}"
root_block_device = {
volume_type = "gp2"
volume_size = 20
delete_on_termination = false
}
vpc_security_group_ids = ["${var.security_group_ids}"]
}
data "aws_ebs_volume" "my_vol" {
most_recent = true
filter {
name = "volume-type"
values = ["gp2"]
}
filter {
name = "tag:Name"
values = [var.volume_name]
}
}
resource "aws_volume_attachment" "my_vol_att" {
device_name = "/dev/xvdf"
volume_id = "${element(aws_ebs_volume.my_vol.*.id, count.index)}"
instance_id = "${element(aws_instance.my_ec2.*.id, count.index)}"
count = "${var.node_count}"
}
variable "volume_name"{
type = string
default = "my_volume"
}
main.tf
module {
source = "./my-module"
volume_name = "MyManuallyCreatedVolume
}
Note: That the data source I use here is trying to match a volume via the Name tag, which you didn't set in your code but is set in the example here.
Architecting your code like this will allow you to create the volume manually, or via some higher level terraform code that calls this one. As long as your state files are separate, you will be able to destroy your Instance(s) and volumes independently of one-another.

Correct design pattern for single server in AWS

I have customized cluster software that runs in a single AZ (subnet). One of the servers is the "controller". There can only be one of these running at a time. I need to be able to have it in the local DNS. It needs to automatically rebuild itself if it fails for any reason. I do not believe I need an elb/alb/nlb setup for this. However, when I set the system up with an autoscaling group, I am not able to get to the private IP address to update the route53 record. Is there a correct design pattern for this in AWS?
Here is the stub code, which does work in rebuilding the server from scratch if it is stopped or becomes unhealthy.
resource "aws_launch_configuration" "example" {
image_id = "${lookup(var.AmiLinux, var.region)}"
instance_type = "t2.micro"
security_groups = [aws_security_group.ingress-all-test.id]
key_name = "akeyname"
lifecycle {
create_before_destroy = true
}
}
data "aws_availability_zones" "all" {}
resource "aws_autoscaling_group" "example" {
launch_configuration = aws_launch_configuration.example.id
min_size = 1
max_size = 1
health_check_grace_period = 60
vpc_zone_identifier = ["${aws_subnet.subnetTest.id}"]
tag {
key = "Name"
value = "tf-asg-example"
propagate_at_launch = true
}
}
I do like the above as it does maintain a single server in an AZ. However, ASG makes it rather hard to get to the IP. I am not looking for the user-data to "hack" the change on boot. Since it can only run in a single subnet (AZ), I cannot use an elb. Thanks in advance for any design pattern for this type of setup.

Creating RDS Instances from Snapshot Using Terraform

Working on a Terraform project in which I am creating an RDS cluster by grabbing and using the most recent production db snapshot:
# Get latest snapshot from production DB
data "aws_db_snapshot" "db_snapshot" {
most_recent = true
db_instance_identifier = "${var.db_instance_to_clone}"
}
#Create RDS instance from snapshot
resource "aws_db_instance" "primary" {
identifier = "${var.app_name}-primary"
snapshot_identifier = "${data.aws_db_snapshot.db_snapshot.id}"
instance_class = "${var.instance_class}"
vpc_security_group_ids = ["${var.security_group_id}"]
skip_final_snapshot = true
final_snapshot_identifier = "snapshot"
parameter_group_name = "${var.parameter_group_name}"
publicly_accessible = true
timeouts {
create = "2h"
}
}
The issue with this approach is that following runs of the terraform code (once another snapshot has been taken) want to re-create the primary RDS instance (and subsequently, the read replicas) with the latest snapshot of the DB. I was thinking something along the lines of a boolean count parameters that specifies first run, but setting count = 0 on the snapshot resource causes issues with the snapshot_id parameters of the db resource. Likewise setting a count = 0 on the db resource would indicate that it would destroy the db.
Use case for this is to be able to make changes to other aspects of the production infrastructure that this terraform plan manages without having to re-create the entire RDS cluster, which is a very time consuming resource to destroy/create.
Try placing an ignore_changes lifecycle block within your aws_db_instance definition:
lifecycle {
ignore_changes = [
snapshot_identifier,
]
}
This will cause Terraform to only look for changes to the database's snapshot_identifier upon initial creation.
If the database already exists, Terraform will ignore any changes to the existing database's snapshot_identifier field -- even if a new snapshot has been created since then.