AWS EC2: why is cloud-init ignoring my script? - amazon-web-services

I have asked this before, and I thought that I had solved it (here), but now it no longer works.
I'm setting up an EC2 instance with terraform:
resource "aws_instance" "bastion" {
ami = "${var.image}"
instance_type = "${var.inst_type}"
key_name = "Some Keys"
subnet_id = "${aws_subnet.jan_public_subnet.id}"
user_data = "${file("${path.module}/test")}"
vpc_security_group_ids = [
"${aws_security_group.jan_vpc_security_group.id}"
]
tags={
Name="${var.inst_name}"
}
}
And the test file (in user_data) is:
#!/bin/bash
cat >/var/lib/cloud/scripts/per-once/test <<!
runcmd:
- mkdir /run/test-per-once
!
chmod 755 /var/lib/cloud/scripts/per-once/test
I can see that the instructions in test are carried out - the file gets created and it has the permissions as specified. But cloud-init ignores it - this is what I see in /var/log/cloud-init.log
2019-09-27 15:25:01,110 - stages.py[DEBUG]: Running module scripts-per-once (<module 'cloudinit.config.cc_scripts_per_once' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_per_once.py'>) with frequency once
2019-09-27 15:25:01,110 - handlers.py[DEBUG]: start: modules-final/config-scripts-per-once: running config-scripts-per-once with frequency once
2019-09-27 15:25:01,110 - util.py[DEBUG]: Writing to /var/lib/cloud/sem/config_scripts_per_once.once - wb: [420] 24 bytes
2019-09-27 15:25:01,110 - helpers.py[DEBUG]: Running config-scripts-per-once using lock (<FileLock using file '/var/lib/cloud/sem/config_scripts_per_once.once'>)
2019-09-27 15:25:01,110 - handlers.py[DEBUG]: finish: modules-final/config-scripts-per-once: SUCCESS: config-scripts-per-once ran successfully
I can only assume that I am doing something wrong - but what?

Related

How can run cmd in aws_ecs_task_definition terraform aws

I need to run docker cmd in aws_ecs_task_definition I can directly run that in my local machine docker but unable to run that on task_defination
docker run -it --rm \
--name n8n \
-p 5678:5678 \
-e DB_TYPE=postgresdb \
-e DB_POSTGRESDB_DATABASE=<POSTGRES_DATABASE> \
-e DB_POSTGRESDB_HOST=<POSTGRES_HOST> \
-e DB_POSTGRESDB_PORT=<POSTGRES_PORT> \
-e DB_POSTGRESDB_USER=<POSTGRES_USER> \
-e DB_POSTGRESDB_SCHEMA=<POSTGRES_SCHEMA> \
-e DB_POSTGRESDB_PASSWORD=<POSTGRES_PASSWORD> \
-v ~/.n8n:/home/node/.n8n \
n8nio/n8n \
n8n start
thats the cmd I need to run but can working fine locally but unable to from aws_ecs_task_definition
I tried to run that cmd from
command inside container_definitions but unable to run that
edited
resource "aws_ecs_task_definition" "task-definition" {
family = "${var.PROJECT_NAME}-task-definition"
container_definitions = jsonencode([
{
name = "${var.PROJECT_NAME}-task-container"
image = "${var.IMAGE_PATH}"
cpu = 10
memory = 512
essential = true
environment = [
{name: "DB_TYPE", value: "postgresdb"},
{name: "DB_POSTGRESDB_DATABASE", value: "${var.DB_NAME}"},
{name: "DB_POSTGRESDB_HOST", value: "${var.DB_NAME}"},
{name: "DB_POSTGRESDB_DATABASE", value: "${aws_db_instance.rds.address}"},
{name: "DB_POSTGRESDB_PORT", value: "5432"},
{name: "DB_POSTGRESDB_USER", value: "${var.DB_USERNAME}"},
{name: "DB_POSTGRESDB_PASSWORD", value: "${var.DB_PASSWORD}"},
]
command = [
"docker", "run",
"-it", "--rm",
"--name", "${var.IMAGE_PATH}",
"-v", "~/.n8n:/home/node/.n8n",
"n8nio/n8n",
"n8n", "start",
"n8n", "restart"
]
portMappings = [
{
containerPort = 5678
hostPort = 5678
}
]
}
])
depends_on = [
aws_db_instance.rds
]
}
resource "aws_ecs_service" "service" {
name = "${var.PROJECT_NAME}-ecs-service"
cluster = aws_ecs_cluster.ecs-cluster.id
task_definition = aws_ecs_task_definition.task-definition.arn
desired_count = 1
iam_role = aws_iam_role.ecs-service-role.arn
depends_on = [aws_iam_policy_attachment.ecs-service-attach]
load_balancer {
elb_name = aws_elb.elb.name
container_name = "${var.PROJECT_NAME}-task-container"
container_port = 5678
}
}
The command in an ECS task definition doesn't take a docker command. It is the command that should be run inside the docker container that ECS is starting. ECS is a docker orchestration service. ECS runs the docker commands for you behind the scenes, you never give ECS a direct docker command to run.
Looking at the docker command you are running locally, the command part that is being executed inside the container is n8n start. So your command should be:
command = [
"n8n", "start"
]
All those other docker command arguments, like the container name, volume mapping, environment variables, image ID, are all arguments that you have would elsewhere in the ECS task definition. It appears you have already specified all those arguments in your Task definition elsewhere, except for the volume mapping.

Terraform aws_eks_node_group creation error with launch_template "Unsupported - The requested configuration is currently not supported"

Objective of my effort: Create EKS node with Custom AMI(ubuntu)
Issue Statement: On creating aws_eks_node_group along with launch_template, I am getting an error:
Error: error waiting for EKS Node Group (qa-svr-centinela-eks-cluster01:qa-svr-centinela-nodegroup01) creation: AsgInstanceLaunchFailures: Could not launch On-Demand Instances. Unsupported - The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed.. Resource IDs: [eks-82bb24f0-2d7e-ba9d-a80a-bb9653cde0c6]
Research so far: As per AWS we can start using custom AMIs for EKS.
Now the custom ubuntu image that I am using, is built with Packer, and I was encrypting the boot, and using AWS KMS External key for that purpose. At first I thought maybe the encryption used for the AMI is causing problem. So I removed the encryption for the AMI from the packer code.
But it didn't resolve the issue. Maybe I am not thinking in the right direction?
Any help is much appreciated. Thanks.
Terraform code used is in the post below.
I am attempting to create an EKS node group with launch template. But getting into an error.
packer code
source "amazon-ebs" "ubuntu18" {
ami_name = "pxx3"
ami_virtualization_type = "hvm"
tags = {
"cc" = "sxx1"
"Name" = "packerxx3"
}
region = "us-west-2"
instance_type = "t3.small"
# AWS Ubuntu AMI
source_ami = "ami-0ac73f33a1888c64a"
associate_public_ip_address = true
ebs_optimized = true
# public subnet
subnet_id = "subnet-xx"
vpc_id = "vpc-xx"
communicator = "ssh"
ssh_username = "ubuntu"
}
build {
sources = [
"source.amazon-ebs.ubuntu18"
]
provisioner "ansible" {
playbook_file = "./ubuntu.yml"
}
}
ubuntu.yml - only used for installing a few libraries
---
- hosts: default
gather_facts: no
become: yes
tasks:
- name: create the license key for new relic agent
shell: |
curl -s https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | apt-key add - && \
printf "deb [arch=amd64] https://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | tee -a /etc/apt/sources.list.d/newrelic-infra.list
- name: check sources.list
shell: |
cat /etc/apt/sources.list.d/newrelic-infra.list
- name: apt-get update
apt: update_cache=yes force_apt_get=yes
- name: install new relic agent
package:
name: newrelic-infra
state: present
- name: update apt-get repo and cache
apt: update_cache=yes force_apt_get=yes
- name: apt-get upgrade
apt: upgrade=dist force_apt_get=yes
- name: install essential softwares
package:
name: "{{ item }}"
state: latest
loop:
- software-properties-common
- vim
- nano
- glibc-source
- groff
- less
- traceroute
- whois
- telnet
- dnsutils
- git
- mlocate
- htop
- zip
- unzip
- curl
- ruby-full
- wget
ignore_errors: yes
- name: Add the ansible PPA to your system’s sources list
apt_repository:
repo: ppa:ansible/ansible
state: present
mode: 0666
- name: Add the deadsnakes PPA to your system’s sources list
apt_repository:
repo: ppa:deadsnakes/ppa
state: present
mode: 0666
- name: install softwares
package:
name: "{{ item }}"
state: present
loop:
- ansible
- python3.8
- python3-winrm
ignore_errors: yes
- name: install AWS CLI
shell: |
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws_eks_node_group configuration.
resource "aws_eks_node_group" "nodegrp" {
cluster_name = aws_eks_cluster.eks.name
node_group_name = "xyz-nodegroup01"
node_role_arn = aws_iam_role.eksnode.arn
subnet_ids = [data.aws_subnet.tf_subnet_private01.id, data.aws_subnet.tf_subnet_private02.id]
scaling_config {
desired_size = 2
max_size = 2
min_size = 2
}
depends_on = [
aws_iam_role_policy_attachment.nodepolicy01,
aws_iam_role_policy_attachment.nodepolicy02,
aws_iam_role_policy_attachment.nodepolicy03
]
launch_template {
id = aws_launch_template.eks.id
version = aws_launch_template.eks.latest_version
}
}
aws_launch_template configuration.
resource "aws_launch_template" "eks" {
name = "${var.env}-launch-template"
update_default_version = true
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = 50
}
}
credit_specification {
cpu_credits = "standard"
}
ebs_optimized = true
# AMI generated with packer (is private)
image_id = "ami-0ac71233a184566453"
instance_type = t3.micro
key_name = "xyz"
network_interfaces {
associate_public_ip_address = false
}
}

Terraform: wait till the instance is "reachable"

I have some Terraform code with an aws_instance and a null_resource:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform - it works as expected.
Question: How can I run local-exec only when the instance is running and accepting an SSH connection?
The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There's a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.
One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "remote-exec" {
connection {
host = aws_instance.example.public_dns
user = "centos"
file = file("files/id_rsa")
}
inline = ["echo 'connected!'"]
}
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
This will first attempt to connect to the instance's public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.
Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance's user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:
#!/bin/bash
while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
echo -e "\033[1;36mWaiting for cloud-init..."
sleep 1
done
There is an ansible specific solution for this problem. Add this code to you playbook(there is all so pre_task clause if you use roles)
- name: will wait till reachable
hosts: all
gather_facts: no # important
tasks:
- name: Wait for system to become reachable
wait_for_connection:
- name: Gather facts for the first time
setup:
For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here's some sample code:
instanceId=$1
echo "Waiting for instance to bootstrap ..."
tries=0
responseCode=1
while [[ $responseCode != 0 && $tries -le 10 ]]
do
echo "Try # $tries"
cmdId=$(aws ssm send-command --document-name AWS-RunShellScript --instance-ids $instanceId --parameters commands="cat /tmp/job-done.txt # or some other validation logic" --query Command.CommandId --output text)
sleep 5
responseCode=$(aws ssm get-command-invocation --command-id $cmdId --instance-id $instanceId --query ResponseCode --output text)
echo "ResponseCode: $responseCode"
if [ $responseCode != 0 ]; then
echo "Sleeping ..."
sleep 60
fi
(( tries++ ))
done
echo "Wait time over. ResponseCode: $responseCode"
Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.
resource "null_resource" "wait_for_instance" {
depends_on = [
aws_instance.my_instance
]
triggers = {
always_run = "${timestamp()}"
}
provisioner "local-exec" {
command = "${path.module}/scripts/check-instance-state.sh ${aws_instance.my_instance.id}"
}
}

Terraform Cloud-Init AWS

I have a Terraform script for make a deploy of Ubuntu.
resource "aws_instance" "runner" {
instance_type = "${var.instance_type}"
ami = "${var.ami}"
user_data = "${data.template_file.deploy.rendered}"
}
data "template_file" "deploy" {
template = "${file("cloudinit.tpl")}"
}
My cloudinit.tpl:
#cloud-config
runcmd:
- apt-get update
- sleep 30
- apt-get install -y awscli
I can't find any issue on cloud-init.log and can't find user-data.log file in /var/log to understand why user-data is not working.
Cloud-init has a special command for system update which carry on about consistency
#cloud-config
package_update: true
package_upgrade: true
packages: ['awscli']
runcmd:
- aws --version
Than you may see command output in the log file, for Ubuntu it is /var/log/cloud-init-output.log

Why does terraform + apt-get fail, intermittently?

I'm using terraform to create mutiple ec2 nodes on aws:
resource "aws_instance" "myapp" {
count = "${var.count}"
ami = "${data.aws_ami.ubuntu.id}"
instance_type = "m4.large"
vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
subnet_id = "${var.subnet_id}"
key_name = "${var.key_name}"
iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"
connection {
user = "ubuntu"
private_key = "${file("${var.key_file_path}")}"
}
provisioner "remote-exec" {
inline = [
"sudo apt-get update",
"sudo apt-get upgrade -y",
"sudo apt-get install -f -y openjdk-7-jre-headless git awscli"
]
}
}
When I run this with say count=4, some nodes intermittently fail with apt-get errors like:
aws_instance.myapp.1 (remote-exec): E: Unable to locate package awscli
while the other 3 nodes found awscli just fine. Now all nodes are created from the same AMI, use the exact same provisioning commands, why would only some of them fail? The variation could potentially come from:
Multiple copies of AMIs on amazon, which aren't identical
Multiple apt-get mirrors which aren't identical
Which is more likely? Any other possibilities I'm missing?
Is there an apt-get "force" type flag I can use that will make the provisioning more repeatable?
The whole point of automating provisioning through scripts is to avoid this kind of variation between nodes :/
The remote-exec provisioner feature of Terraform just generates a shell script that is uploaded to the new instance and runs the commands you specify. Most likely you're actually running into problems with cloud-init which is configured to run on standard Ubuntu AMIs, and the provisioner is attempting to run while cloud-init is also running, so you're running into a timing/conflict.
You can make your script wait until after cloud-init has finished provisioning. cloud-init creates a file in /var/lib/cloud/instance/boot-finished, so you can put this inline with your provisioner:
until [[ -f /var/lib/cloud/instance/boot-finished ]]; do
sleep 1
done
Alternatively, you can take advantage of cloud-init and have it install arbitrary packages for you. You can specify user-data for your instance like so in Terraform (modified from your snippet above):
resource "aws_instance" "myapp" {
count = "${var.count}"
ami = "${data.aws_ami.ubuntu.id}"
instance_type = "m4.large"
vpc_security_group_ids = ["${aws_security_group.myapp-security-group.id}"]
subnet_id = "${var.subnet_id}"
key_name = "${var.key_name}"
iam_instance_profile = "${aws_iam_instance_profile.myapp_instance_profile.id}"
user_data = "${data.template_cloudinit_config.config.rendered}"
}
# Standard cloud-init stuff
data "template_cloudinit_config" "config" {
# I've
gzip = false
base64_encode = false
part {
content_type = "text/cloud-config"
content = <<EOF
packages:
- awscli
- git
- openjdk-7-headless
EOF
}
}