Seemingly Random Timeouts using Packer with Ansible on AWS - amazon-web-services

I am baking AWS AMIs using Packer and Ansible. It seems that after about 6 minutes or so of Packer running, the process will fail. Aside from it taking about 6 minutes, I can't seem to find a logical explanation as to what is happening. The Ansbile playbook will fail at different points along the way but always about 6 minutes after I launch Packer.
I always get an Ansible error when I hit this issue. I either get Timeout (12s) waiting for privilege escalation prompt: or Connection to 127.0.0.1 closed.\r\n
Is there a way to extend the timeouts associated with a playbook or Packer builder?
Packer file contents:
{
"provisioners": [{
"type": "ansible",
"playbook_file": "../ansible/nexus.yml"
}],
"builders": [{
"type": "amazon-ebs",
"region": "us-east-2",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "amzn-ami-hvm*-gp2",
"root-device-type": "ebs"
},
"owners": ["137112412989"],
"most_recent": true
},
"instance_type": "t2.medium",
"ami_virtualization_type": "hvm",
"ssh_username": "ec2-user",
"ami_name": "Nexus (Latest Amazon Linux Base) {{isotime \"2006-01-02T15-04-05-06Z\"| clean_ami_name}}",
"ami_description": "Nexus AMI",
"run_tags": {
"AmiName": "Nexus",
"AmiCreatedBy": "Packer"
},
"tags": {
"Name": "Nexus",
"CreatedBy": "Packer"
}
}]
}

I solved this issue adding following parameters at Ansible configuration file:
[defaults]
forks = 1
[ssh_connection]
ssh_args = -o ControlMaster=no -o ControlPath=none -o ControlPersist=no
pipelining = false
Hope it helps.

Related

AWS EC2 SSH key pair not added to authorized_keys after Packer AMI build

Context
We have an automated AMI building process done by Packer in order to setup our instance images after code changes, and assign them to our Load Balancer launch configuration for faster autoscaling.
Problem
Recently the instances launched with the development environment AMI build started refusing the corresponding private key. After attaching an instance with that same AMI in a public subnet, I connected through EC2 connect and noticed that the public key was not present in the /home/ec2-user/.ssh/authorized_keys file, even though it was added at launch time through the launch configuration or even manually through the console.
The only one present being the Packed temporary SSH key created during the AMI packaging.
Additional info
Note that the key pair is mentionned in the instance details as though it was present but it is NOT, which was tricky to debug because we tend to trust what the console tells us.
What's even more puzzling is that the same AMI build for QA environment (which is exactly the same apart from some application variables) does include the EC2 SSH key correctly, and we are using the same key for DEV & QA environments.
We're using the same Packer version (1.5.1) than ever to avoid inconsistencies, so it most likely cannot come from that, but I suspect it does come from the build since it does not happen with other AMIs.
If someone has a clue about what's going on, I'd be glad to know.
Thanks !
Edit
Since it was requested in the comment, I can show you the "anonymized" packer template, for confidentiality reasons I can't show you the ansible tasks or other details. However, note that this template is IDENTICAL to the one used for QA (same source code) which does not gives the error.
"variables": {
"aws_region": "{{env `AWS_REGION`}}",
"inventory_file": "{{env `Environment`}}",
"instance_type": "{{env `INSTANCE_TYPE`}}"
},
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "{{user `aws_region`}}",
"vpc_id": "vpc-xxxxxxxxxxxxxxx",
"subnet_id": "subnet-xxxxxxxxxxxxxxxxxx",
"security_group_id": "sg-xxxxxxxxxxxxxxxxxxxxx",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "amzn2-ami-hvm-2.0.????????.?-x86_64-gp2",
"root-device-type": "ebs",
"state": "available"
},
"owners": "amazon",
"most_recent": true
},
"instance_type": "t3a.{{ user `instance_type` }}",
"ssh_username": "ec2-user",
"ami_name": "APP {{user `inventory_file`}}",
"force_deregister": "true",
"force_delete_snapshot": "true",
"ami_users": [
"xxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxx",
"xxxxxxxxxxxxxxxxx"
]
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"sudo amazon-linux-extras install ansible2",
"mkdir -p /tmp/libs"
]
},
{
"type": "file",
"source": "../frontend",
"destination": "/tmp"
},
{
"type": "file",
"source": "../backend/target/",
"destination": "/tmp"
},
{
"type": "ansible-local",
"playbook_file": "./app/instance-setup/playbook/setup-instance.yml",
"inventory_file": "./app/instance-setup/playbook/{{user `inventory_file`}}",
"playbook_dir": "./app/instance-setup/playbook",
"extra_arguments": [
"--extra-vars",
"ENV={{user `inventory_file`}}",
"--limit",
"{{user `inventory_file`}}",
"-v"
]
},
{
"type": "shell",
"inline": [
"sudo rm -rf /tmp/*"
]
}
]
}

how to get instance name, number of cpu, cores and operating system info of EC2 instance

I am writing Java code to retrieve below info of an EC2 instance? But not sure about the right AWS API to use to get these info.
instance name
number of cpu
number of virtual processor cores
operating system version, environment
You can get that from the instance metadata, as described at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
Note that this is an HTTP service hosted locally for each instance at 169.254.169.254, you can access with java http clients or directly, for example:
instance name
$ curl http://169.254.169.254/latest/meta-data/instance-id
i-024a0de14f70ab64f
number of cpu
number of virtual processor cores
These are defined by the instance-type:
$ curl http://169.254.169.254/latest/meta-data/instance-type
t3.2xlarge
operating system operating system version, environment?
This is defined by the image, and the details can be fetched from the describe-images api
$ aws ec2 describe-images \
--image-ids $(curl -s http://169.254.169.254/latest/meta-data/ami-id)
{
"Images": [
{
"VirtualizationType": "hvm",
"Description": "Cloud9 Cloud9Default AMI",
"Hypervisor": "xen",
"EnaSupport": true,
"SriovNetSupport": "simple",
"ImageId": "ami-07606bae9eee7051c",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"SnapshotId": "snap-0ee3e3de47cfb2ce4",
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 8,
"Encrypted": false
}
}
],
"Architecture": "x86_64",
"ImageLocation": "751997845865/Cloud9Default-2019-02-18T10-14",
"RootDeviceType": "ebs",
"OwnerId": "751997845865",
"RootDeviceName": "/dev/xvda",
"CreationDate": "2019-02-18T11:02:13.000Z",
"Public": true,
"ImageType": "machine",
"Name": "Cloud9Default-2019-02-18T10-14"
}
]
}

AWS EC2 ELB Docker Routing

I am running into issues running docker-compose because an elastic load balancer. Setup is ELB does 443 -> TCP 80 and docker does 0.0.0.0:80->4444/tcp
However the server doesn't seem to be hit and I get DNS_PROBE_FINISHED_NXDOMAIN
Trying to verify if this is a docker setup issue. Docker version 1.12.6 and docker-compose version 1.12.0
is it normal for the bridge config to not have a Gateway defined?
```
[root#loom-server1 ec2-user]# docker network inspect 8f1b234bfb0b
[
{
"Name": "bridge",
"Id": "8f1b234bfb0b6c41962265299871cd8053757ec145f8e3f6b63960b71ceb3690",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16"
}
]
},
"Internal": false,
"Containers": {
"bbd2b84545a0e3519e37fb4015eea45637b75ccaa1dd362aff68ff41f3118055": {
"Name": "dockercompose_loom_1",
"EndpointID": "b7da2d31ff2503846d4f621bf355b8522afb8dabd1f02ca638c9ef032afefa76",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
```
weird part is that it's able to load assets
ec2-52-53-84-186.us-west-1.compute.amazonaws.com/assets/js/homepage.js
The link may be up or down as I experiment with instances.
This is all running on opsworks.
Any insight or help would be appreciated.

Datapipeline task stuck in WAITING_FOR_RUNNER state

I have created a simple ShellCommandActivity which echos some text. It runs on a plain ec2 (vpc) instance. I see that the host has spinned up but it never executes the tasks and the task remains in WAITING_FOR_RUNNER status. After all the retries I get this error
Resource is stalled. Associated tasks not able to make progress.
I followed this troubleshoot-link but it didn't resolve my problem.
Here is the json description of the pipeline:
{
"objects": [
{
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"name": "ec2-compute",
"id": "ResourceId_viWO9",
"type": "Ec2Resource"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://xyz-logs/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"name": "EchoActivity",
"id": "ShellCommandActivityId_kc8xz",
"runsOn": {
"ref": "ResourceId_viWO9"
},
"type": "ShellCommandActivity",
"command": "echo HelloWorld"
}
],
"parameters": []
}
What could be the problem here?
Thanks in advance.
I figured this out. The routing table in the VPC subnets was not properly configured.
To be specific, in my case the routing table didn't have 0.0.0.0/0 mapped to an internet-gateway. When I added this mapping, everything started working.

How to create an EC2 machine with Packer?

In AWS to make a new AMI, I usually run commands manually to verify that they're working, and then I image that box to create an AMI. But there are alternatives like packer.io, what's a minimal working example for using this service to make a simple customized AMI?
https://github.com/devopracy/devopracy-base/blob/master/packer/base.json There's a packer file that looks very similar to what I use at work for a base image. It's not tested, but we can go into it a bit. The base image is my own base - all services are built using it as a source ami. That way I control my dependencies and ensure there's a consistent os under my services. You could simply add cookbooks from the chef supermarket to see how provisioning a service works with this file, or use this as a base. As a base you would make a similar, less detailed build for the service and call this as the source ami.
This first part declares the variables I use to pack. The variables are injected before the build from a bash file which I DON'T CHECK IN TO SOURCE CONTROL. I keep the bash script in my home directory and source it before calling packer build. Note there's a cookbook path for the chef provisioner. I use the base_dir as the location on my dev box or the build server. I use a bootstrap key to build; packer will make it's own key to ssh if you don't specify one, but it's nice to make a key on aws and then launch your builds with it. That makes debugging packer easier on the fly.
"variables": {
"aws_access_key_id": "{{env `AWS_ACCESS_KEY`}}",
"aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
"ssh_private_key_file": "{{env `SSH_PRIVATE_KEY_FILE`}}",
"cookbook_path": "{{env `CLOUD_DIR`}}/ops/devopracy-base/cookbooks",
"base_dir": "{{env `CLOUD_DIR`}}"
},
The next part of the file has the builder. I use amazon-ebs at work and off work too, it's simpler to create one file, and often the larger instance types are only available as ebs. In this file, I resize the volume so we have a bit more room to install stuffs. Note the source ami isn't specified here, I lookup the newest version here or there. Ubuntu has a handy site if you're using it, just google ec2 ubuntu locator. You need to put in a source image to build on.
"builders": [{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key_id`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "us-west-2",
"source_ami": "",
"instance_type": "t2.small",
"ssh_username": "fedora",
"ami_name": "fedora-base-{{isotime \"2006-01-02\"}}",
"ami_description": "fedora 21 devopracy base",
"security_group_ids": [ "" ],
"force_deregister": "true",
"ssh_keypair_name": "bootstrap",
"ssh_private_key_file": "{{user `ssh_private_key_file`}}",
"subnet_id": "",
"ami_users": [""],
"ami_block_device_mappings": [{
"delete_on_termination": "true",
"device_name": "/dev/sda1",
"volume_size": 30
}],
"launch_block_device_mappings": [{
"delete_on_termination": "true",
"device_name": "/dev/sda1",
"volume_size": 30
}],
"tags": {
"stage": "dev",
"os": "fedora 21",
"release": "latest",
"role": "base",
"version": "0.0.1",
"lock": "none"
}
}],
It's very useful to tag your images when you start doing automations on the cloud. These tags are how you'll handle your deploys and such. fedora is the default user for fedora, ubuntu for ubuntu, ec2-user for amazon linux, etc. You can look those up in docs for your distro.
Likewise, you need to add a security group to this file, and a subnet to launch in. Packer will use the defaults in aws if you don't specify those but if you're building on a buildserver or a non-default vpc, you must specify. Force deregister will get rid of an ami with the same name on a successful build - I name by date, so I can iterate on the builds daily and not pile up a bunch of images.
Finally, I use the chef provisioner. I have the cookbook in another repo, and the path to it on the buildserver is a variable at the top. Here we're looking at chef-zero for provisioning, which is technically not supported but works fine with the chef client provisioner and a custom command. Beside the chef run, I do some scripts of my own, and follow it up by running serverspec tests to make sure everything is hunky dory.
"provisioners": [
{
"type": "shell",
"inline": [
]
},
{
"type": "shell",
"script": "{{user `base_dir`}}/ops/devopracy-base/files/ext_disk.sh"
},
{
"type": "shell",
"inline": [
"sudo reboot",
"sleep 30",
"sudo resize2fs /dev/xvda1"
]
},
{
"type": "shell",
"inline": [
"sudo mkdir -p /etc/chef && sudo chmod 777 /etc/chef",
"sudo mkdir -p /tmp/packer-chef-client && sudo chmod 777 /tmp/packer-chef-client"
]
},
{
"type": "file",
"source": "{{user `cookbook_path`}}",
"destination": "/etc/chef/cookbooks"
},
{
"type": "chef-client",
"execute_command": "cd /etc/chef && sudo chef-client --local-mode -c /tmp/packer-chef-client/client.rb -j /tmp/packer-chef-client/first-boot.json",
"server_url": "http://localhost:8889",
"skip_clean_node": "true",
"skip_clean_client": "true",
"run_list": ["recipe[devopracy-base::default]"]
},
{
"type": "file",
"source": "{{user `base_dir`}}/ops/devopracy-base/test/spec",
"destination": "/home/fedora/spec/"
},
{
"type": "file",
"source": "{{user `base_dir`}}/ops/devopracy-base/test/Rakefile",
"destination": "/home/fedora/Rakefile"
},
{
"type": "shell",
"inline": ["/opt/chef/embedded/bin/rake spec"]
},
{
"type": "shell",
"inline": ["sudo chmod 600 /etc/chef"]
}
]
}
Naturally there's some goofy business in here around the chmoding of the chef dir, and it's not obviously secure - I run my builds in a private subnet. I hope this helps you get off the ground with packer, which is actually an amazing piece of software, and good fun! Ping me in the comments with any questions, or hit me up on github. All the devopracy stuff is a WIP, but those files will probably mature when I get more time to work on it :P
Good Luck!