Why does terraform + apt-get fail, intermittently? - amazon-web-services

I'm using terraform to create mutiple ec2 nodes on aws:
resource "aws_instance" "myapp" {
count = "${var.count}"
ami = "${data.aws_ami.ubuntu.id}"
instance_type = "m4.large"
vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
subnet_id = "${var.subnet_id}"
key_name = "${var.key_name}"
iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"
connection {
user = "ubuntu"
private_key = "${file("${var.key_file_path}")}"
}
provisioner "remote-exec" {
inline = [
"sudo apt-get update",
"sudo apt-get upgrade -y",
"sudo apt-get install -f -y openjdk-7-jre-headless git awscli"
]
}
}
When I run this with say count=4, some nodes intermittently fail with apt-get errors like:
aws_instance.myapp.1 (remote-exec): E: Unable to locate package awscli
while the other 3 nodes found awscli just fine. Now all nodes are created from the same AMI, use the exact same provisioning commands, why would only some of them fail? The variation could potentially come from:
Multiple copies of AMIs on amazon, which aren't identical
Multiple apt-get mirrors which aren't identical
Which is more likely? Any other possibilities I'm missing?
Is there an apt-get "force" type flag I can use that will make the provisioning more repeatable?
The whole point of automating provisioning through scripts is to avoid this kind of variation between nodes :/

The remote-exec provisioner feature of Terraform just generates a shell script that is uploaded to the new instance and runs the commands you specify. Most likely you're actually running into problems with cloud-init which is configured to run on standard Ubuntu AMIs, and the provisioner is attempting to run while cloud-init is also running, so you're running into a timing/conflict.
You can make your script wait until after cloud-init has finished provisioning. cloud-init creates a file in /var/lib/cloud/instance/boot-finished, so you can put this inline with your provisioner:
until [[ -f /var/lib/cloud/instance/boot-finished ]]; do
sleep 1
done
Alternatively, you can take advantage of cloud-init and have it install arbitrary packages for you. You can specify user-data for your instance like so in Terraform (modified from your snippet above):
resource "aws_instance" "myapp" {
count = "${var.count}"
ami = "${data.aws_ami.ubuntu.id}"
instance_type = "m4.large"
vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
subnet_id = "${var.subnet_id}"
key_name = "${var.key_name}"
iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"
user_data = "${data.template_cloudinit_config.config.rendered}"
}
# Standard cloud-init stuff
data "template_cloudinit_config" "config" {
# I've
gzip = false
base64_encode = false
part {
content_type = "text/cloud-config"
content = <<EOF
packages:
- awscli
- git
- openjdk-7-headless
EOF
}
}

Related

How can I have a Terraform output become a permanent value in a userdata script?

I'm not sure what the best way to do this is - but I want to deploy EFS and an ASG + Launch Template with Terraform. I'd like my userdata script (in my launch template) to run commands to mount to EFS
For example:
sudo mount -t efs -o tls fs-0b28edbb9efe91c25:/ efs
My issue is: I need my userdata script to receive my EFS ID, however, this can't just happen on my initial deploy, I also need this to happen whenever I perform a rolling update. I want to be able to change the AMI ID in my launch template, which will perform a rolling update when I run terraform apply and need my EFS ID to be in my userdata script to run the command to mount EFS.
Is there a way to have a terraform output get permanently added to my Userdata script? What are other alternatives for making this happen? Would it involve Cloudformation or other AWS services?
main.tf
resource "aws_vpc" "mtc_vpc" {
cidr_block = "10.123.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "dev"
}
}
resource "aws_launch_template" "foobar" {
name_prefix = "LTTest"
image_id = "ami-017c001a88dd93847"
instance_type = "t2.micro"
update_default_version = true
key_name = "lttest"
user_data = base64encode(templatefile("${path.module}/userdata.sh", {efs_id = aws_efs_file_system.foo.id}))
iam_instance_profile {
name = aws_iam_instance_profile.test_profile.name
}
vpc_security_group_ids = [aws_security_group.mtc_sg.id]
}
resource "aws_autoscaling_group" "bar" {
desired_capacity = 2
max_size = 2
min_size = 2
vpc_zone_identifier = [
aws_subnet.mtc_public_subnet1.id
]
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
launch_template {
id = "${aws_launch_template.foobar.id}"
version = aws_launch_template.foobar.latest_version
}
}
resource "aws_efs_file_system" "foo" {
creation_token = "jira-efs"
}
resource "aws_efs_mount_target" "alpha" {
file_system_id = aws_efs_file_system.foo.id
subnet_id = aws_subnet.mtc_public_subnet1.id
security_groups = [aws_security_group.mtc_sg.id]
}
Update:
User-data Script:
#!/usr/bin/env bash
sudo yum install -y amazon-efs-utils
sudo yum install -y git
cd /home/ec2-user
mkdir efs
sudo mount -t efs -o tls ${efs_id}:/ efs
There are a few ways to do this. A couple that come to mind are:
Provide the EFS ID to the user data script using the templatefile() function.
Give your EC2 instance permissions (via IAM) to use the EFS API to search for the ID.
The first option is probably the most practical.
First, define your EFS filesystem (and associated aws_efs_mount_target and aws_efs_access_point resources, but I'll omit those here):
resource "aws_efs_file_system" "efs" {}
Now you can define the user data with the templatefile() function:
resource "aws_launch_template" "foo" {
# ... all the attributes ...
user_data = base64encode(templatefile("${path.module}/user-data.sh.tpl", {
efs_id = aws_efs_file_system.efs.id # Use dns_name or id here
}))
}
The contents of user-data.sh.tpl can have all your set up steps, including the filesystem mount:
sudo mount -t efs -o tls ${efs_id}:/ efs
When Terraform renders the user data in the launch template, it will substitute the variable.

Pass ProxyCommand to Terraform Provisioner 'local-exec' Successfully

I am setting up several servers in AWS utilizing terraform to deploy them, and ansible to configure (the configuration is quite complex). I would like to accomplish all of this from Terraform but I can't seem to get the ProxyCommand to execute correctly (I believe due to the use of mixed quotes). I need to utilize the ProxyCommand as the commands must be proxied through a bastion host. First I provision the bastion:
resource "aws_instance" "bastion" {
ami = var.ubuntu2004
instance_type = "t3.small"
associate_public_ip_address = true
subnet_id = aws_subnet.some_subnet.id
vpc_security_group_ids = [aws_security_group.temp.id]
key_name = "key"
tags = {
Name = "bastion"
}
}
and then I deploy another server which I would like to configure with Ansible utilizing Terraform's provisioner 'local-exec':
resource "aws_instance" "server1" {
ami = var.ubuntu2004
instance_type = "t3.small"
subnet_id = aws_subnet.some_other_subnet.id
vpc_security_group_ids = [aws_security_group.other_temp.id]
key_name = "key"
tags = {
Name = "server1"
}
provisioner "local-exec" {
command = "sleep 120; ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -u ubuntu --private-key ~/.ssh/id_rsa --ssh-common-args='-o ProxyCommand='ssh -W %h:%p ubuntu#${aws_instance.bastion.public_ip}'' -i ${self.private_ip} main.yml"
}
}
I have confirmed I can get all of this working if I just have Terraform provision the infrastructure, and then manually run Ansible with the Proxy Command input, but it fails if I try and utilize local-exec, seemingly because I have to incorporate multiple single quotes which breaks the command. Not sure if the bastion variable is done correctly either. Probably a simple fix, but anyone know how to fix this or maybe accomplish this in an easier way? Thanks

Terraform: accessing EC2 instance with apache installed on the bowser using public IP address just keeps loading, loading and loading

I am deploying an EC2 instance to AWS using terraform. I am using user data section of EC2 to install apache.
This is my template.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 2.70"
}
}
}
provider "aws" {
region = "eu-west-1"
access_key = "xxxxxxxxx"
secret_key = "xxxxxxxxx"
}
resource "aws_instance" "myinstance" {
ami = "ami-047bb4163c506cd98"
instance_type = "t2.micro"
tags = {
Name = "My first terraform instance"
}
vpc_security_group_ids = [ "sg-0721b555cc402a3ad" ]
user_data = "${file("install_apache.sh")}"
key_name = "MyKey"
}
As you can see, I am running a shell script to install apache in the user_data section. This is my install_apache.sh file.
#!/bin/bash -xe
cd /tmp
yum update -y
yum install -y httpd24
echo "Hello from the EC2 instance." > /var/www/html/index.html
sudo -u root service httpd start
As you can see, I have assigned an existing security group to the instance. My security group white list the HTTP request on port 80 for inbound rules as follow.
Then I deploy the template running this command, "terraform apply".
Then I open the public IP address of the instance on the browser. It just keeps loading, loading and loading showing the blank screen. What is wrong with my code and how can I fix it?
Your script is form Amazon Linux 1. For AL2 it should be:
#!/bin/bash -xe
cd /tmp
yum update -y
yum install -y httpd
echo "Hello from the EC2 instance $(hostname -f)." > /var/www/html/index.html
systemctl start httpd
I added $(hostname -f) as enhancement.

Terraform Cloud-Init AWS

I have a Terraform script for make a deploy of Ubuntu.
resource "aws_instance" "runner" {
instance_type = "${var.instance_type}"
ami = "${var.ami}"
user_data = "${data.template_file.deploy.rendered}"
}
data "template_file" "deploy" {
template = "${file("cloudinit.tpl")}"
}
My cloudinit.tpl:
#cloud-config
runcmd:
- apt-get update
- sleep 30
- apt-get install -y awscli
I can't find any issue on cloud-init.log and can't find user-data.log file in /var/log to understand why user-data is not working.
Cloud-init has a special command for system update which carry on about consistency
#cloud-config
package_update: true
package_upgrade: true
packages: ['awscli']
runcmd:
- aws --version
Than you may see command output in the log file, for Ubuntu it is /var/log/cloud-init-output.log

How to create AWS AMI from created instance using terraform?

I am setting up an aws instance with wordpress installation and want to create an AMI using created instance. Below I attach my code.
provider "aws" {
region = "${var.region}"
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
}
resource "aws_instance" "test-wordpress" {
ami = "${var.image_id}"
instance_type = "${var.instance_type}"
key_name = "test-web"
#associate_public_ip_address = yes
user_data = <<-EOF
#!/bin/bash
sudo yum update -y
sudo amazon-linux-extras install -y lamp-mariadb10.2-php7.2 php7.2
sudo yum install -y httpd mariadb-server
cd /var/www/html
sudo echo "healthy" > healthy.html
sudo wget https://wordpress.org/latest.tar.gz
sudo tar -xzf latest.tar.gz
sudo cp -r wordpress/* /var/www/html/
sudo rm -rf wordpress
sudo rm -rf latest.tar.gz
sudo chmod -R 755 wp-content
sudo chown -R apache:apache wp-content
sudo service httpd start
sudo chkconfig httpd on
EOF
tags = {
Name = "test-Wordpress-Server"
}
}
resource "aws_ami_from_instance" "test-wordpress-ami" {
name = "test-wordpress-ami"
source_instance_id = "${aws_instance.test-wordpress.id}"
depends_on = [
aws_instance.test-wordpress,
]
tags = {
Name = "test-wordpress-ami"
}
}
AMI will be created but When I use that AMI to create an another instance wordpress installation not in there. How can I solve this issue?
The best way to create AMI images i think is using Packer, also from Hashicorp like terraform.
What is Packer?
Provision Infrastructure with Packer Packer is HashiCorp's open-source tool for creating machine images from source
configuration. You can configure Packer images with an operating
system and software for your specific use-case.
Packer creates an instance with temporary keypair, security_group and IAM roles. In the provisioner "shell" are custom inline commands possible. Afterwards you can use this ami with your terraform code.
A sample script could look like this:
packer {
required_plugins {
amazon = {
version = ">= 0.0.2"
source = "github.com/hashicorp/amazon"
}
}
}
source "amazon-ebs" "linux" {
# AMI Settings
ami_name = "ami-oracle-python3"
instance_type = "t2.micro"
source_ami = "ami-xxxxxxxx"
ssh_username = "ec2-user"
associate_public_ip_address = false
ami_virtualization_type = "hvm"
subnet_id = "subnet-xxxxxx"
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 8
volume_type = "gp2"
delete_on_termination = true
encrypted = false
}
# Profile Settings
profile = "xxxxxx"
region = "eu-central-1"
}
build {
sources = [
"source.amazon-ebs.linux"
]
provisioner "shell" {
inline = [
"export no_proxy=localhost"
]
}
}
You can find documentation here.
So you can search for AMI by your tag as described in documentation
In your case:
data "aws_ami" "example" {
executable_users = ["self"]
most_recent = true
owners = ["self"]
filter {
name = "tag:Name"
values = ["test-wordpress-ami"]
}
}
and then refer ID as ${data.aws_ami.example.image_id}