I am sure many who work with Terraform and Ansible or just Ansible on a daily basis must have come across this question.
Some background:
I create my infrastructure on AWS using Terraform and configure my machines using Ansible. my inventory file contains hardcoded public ip addresses with some variables. As the business demands, I create and destroy my machines very often.
My question:
I want don't want to update my inventory file with new public IP addresses every time I destroy and create my instances. So my fundamental requirement is - every time I destroy my machine I should be able run my Terraform script to recreate the machines and when I run my Ansible Playbook, Ansible should be able to pick up the right target machines and run the playbook. I need to know what I need to describe in my inventory file to achieve this automation. Domain name (www.fooexample.com) and static public IP addresses in the inventory file is not an option in my case? I have seen scripts that do it with, what it looks like a hostname (webserver1)
There are forums that talk about using the ec2.py option but ec2.py is getting all the public ip addresses associated with the account but i only want to target some of the machines as you can imagine and not all of them with my playbook.
Any help regarding this would be appreciated.
Thanks in Advance
I do something similar in GCP but the concept should apply to AWS.
Starting with Ansible 2.7 there is a new inventory plugin architecture and some inventory plugins to replace the dynamic inventory scripts (such as ec2.py and gcp.py). The AWS plugin documentation is at https://docs.ansible.com/ansible/2.9/plugins/inventory/aws_ec2.html.
First, you need to tag the groups of hosts you want to target in AWS. You should be able to handle this with Terraform (such as Service = Web).
Next, enable the aws_ec2 plugin in ansible.cfg by adding:
[inventory]
enable_plugins = aws_ec2
Now, convert over to using the new plugin instead of ec2.py. This means creating a aws_ec2.yaml file based on the documentation. An example might look like:
plugin: aws_ec2
regions:
- us-east-1
keyed_groups:
- prefix: tag
key: tags
# Set individual variables with compose
compose:
ansible_host: public_ip_address
The key parts here are the keyed_groups and compose section. This will give you the public IP addresses as the host to connect to in inventory and groups you can limit to with -l or --limit.
Considering you had some instances in us-east-1 tagged with Service = Web you could target them like:
ansible -i aws_ec2.yaml -m ping -l tag_Service_Web
This would target just those tagged hosts on their public IP address. Any dynamic scaling you do (such as increasing the count in Terraform for that resource) will be picked up by the inventory plugin on next run.
You can also use the tag in playbooks. If you had a playbook that you always targeted at these hosts you can set hosts: tag_Service_Web in the playbook.
Bonus:
I've been experimenting with an Ansible Pull model that automates some of this bootstrapping. The idea is to combine cloud-init with a special script to bootstrap the playbook for that host automatically.
Example script that cloud-init kicks off:
#!/bin/bash
set -euo pipefail
lock_files=(
/var/lib/dpkg/lock
/var/lib/apt/lists/lock
/var/lib/dpkg/lock-frontend
/var/cache/apt/archives/lock
/var/lib/apt/daily_lock
)
export ANSIBLE_HOST_PATTERN_MISMATCH="ignore"
export PATH="/tmp/ansible-venv/bin:$PATH"
for file in "${lock_files[#]}"; do
while fuser "$file" >/dev/null 2>&1; do
echo "Waiting for lock $file to be available..."
sleep 5
done
done
apt-get update -qy
apt-get install --no-install-recommends -qy virtualenv python-virtualenv python-nacl python-wheel python-bcrypt
virtualenv -p /usr/bin/python --system-site-packages /tmp/ansible-venv
pip install ansible==2.7.10 apache-libcloud==2.3.0 jmespath==0.9.3
ansible-pull myplaybook.yaml \
-U git#github.com:myorg/infrastructure.git \
-i gcp_compute.yaml \
--private-key /tmp/ansible-keys/infrastructure_ssh_deploy_key \
--vault-password-file /tmp/ansible-keys/vault \
-d /tmp/ansible-infrastructure \
--accept-host-key
This script is a bit simplified from my actual one (leaving out some domain specific authentication and key providing stuff). But you can adapt it to AWS by doing something like bootstrapping keys from S3 or KMS or another boot time configuration service. I find that ansible-pull works well when the playbook only takes a minute or two to run and doesn't have any dependencies on external inventory (like references to other groups such as to gather IP addresses).
Related
I’m trying to connect to a GCP compute instance through IAP. I have a service account with permissions.
I have tried the following
Basic ansible ping,ansible -vvvv GCP -m ping, which errors because the host name is not found bc I do not have an external ip
I have set ssh_executeable=wrapper.sh like here
Number 2 is almost working but regexing commands are hacky.
Is there a native ansible solution?
Edit: The gcp_compute dynamic inventory does work for pinging instances but it does not work for managing the instances.
Ansible does NOT support package or system management while tunneling through IAP.
For those who are still looking for a solution to use IAP SSH with Ansible on an internal IP. I've made some changes to the scripts listed here
My main problem was the fact that I had to add --zone as an option, as gcloud wouldn't automatically detect this when run through Ansible.
As I didn't want to call the CLI, adding more waittime, I've opted for using group_vars to set my ssh options. This also allows me to specify other options to the gcloud compute ssh command.
Here are the contents of the files needed for setup:
ansible.cfg
[inventory]
enable_plugins = gcp_compute
[defaults]
inventory = misc/inventory.gcp.yml
interpreter_python = /usr/bin/python
[ssh_connection]
# Enabling pipelining reduces the number of SSH operations required
# to execute a module on the remote server.
# This can result in a significant performance improvement
# when enabled.
pipelining = True
scp_if_ssh = False
ssh_executable = misc/gcp-ssh-wrapper.sh
ssh_args = None
misc/gcp-ssh-wrapper.sh
#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP SSH option to connect
# to our servers.
# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${#: -2: 1}"
cmd="${#: -1: 1}"
# Unfortunately ansible has hardcoded ssh options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for ssh_arg in "${#: 1: $# -3}" ; do
if [[ "${ssh_arg}" == --* ]] ; then
opts+="${ssh_arg} "
fi
done
exec gcloud compute ssh $opts "${host}" -- -C "${cmd}"
group_vars/all.yml
---
ansible_ssh_args: --tunnel-through-iap --zone={{ zone }} --no-user-output-enabled --quiet
As you can see, by using the ansible_ssh_args from the group_vars, we can now pass the zone as it's already known through the inventory.
If you also want to be able to copy files through gcloud commands, you can use the following configuration:
ansible.cfg
[ssh_connection]
# Enabling pipelining reduces the number of SSH operations required to
# execute a module on the remote server. This can result in a significant
# performance improvement when enabled.
pipelining = True
ssh_executable = misc/gcp-ssh-wrapper.sh
ssh_args = None
# Tell ansible to use SCP for file transfers when connection is set to SSH
scp_if_ssh = True
scp_executable = misc/gcp-scp-wrapper.sh
misc/gcp-scp-wrapper.sh
#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP option to connect
# to our servers.
# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${#: -2: 1}"
cmd="${#: -1: 1}"
# Unfortunately ansible has hardcoded scp options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for scp_arg in "${#: 1: $# -3}" ; do
if [[ "${scp_arg}" == --* ]] ; then
opts+="${scp_arg} "
fi
done
# Remove [] around our host, as gcloud scp doesn't understand this syntax
cmd=`echo "${cmd}" | tr -d []`
exec gcloud compute scp $opts "${host}" "${cmd}"
group_vars/all.yml
---
ansible_ssh_args: --tunnel-through-iap --zone={{ zone }} --no-user-output-enabled --quiet
ansible_scp_extra_args: --tunnel-through-iap --zone={{ zone }} --quiet
gce dynamic inventory does not work unless all the inventory are publicly accessible. For private ip, the tunnel is not invoked when ansible commands are executed. The gce dynamic inventory will return inventory, but you can't actually send commands if behind a tunnel and private IP only. The only work around i could find is to have the ssh binary point at a custom script which calls the gcloud wrapper.
(Converting my comment as an answer as requested by OP)
Ansible has a native gce dynamic inventory plugin that you should use to connect to your instances.
To make lotjuh's answer work I had to also update my inventory.gcp.yml file to have the following
plugin: gcp_compute
projects:
- myproject
auth_kind: application
hostnames:
- name
Without the hostnames: - name I was getting gcloud ssh errors since it tried to ssh into the instances using their host IP.
This approach also requires that the project be set in the gcloud config with gcloud config set project myproject
not a direct answer to the OP, but after having crushed my head on how to keep my project safe (via IAP) and let ansible work at reasonable speed, I've ended up with a mix of IAP and OS Login. This continues to use the dynamic inventory if needed.
I use IAP and no public IPs on my VMs, then I've enabled OS Login project wide and I've created a small "ansible-server" VM internal to the project (well this is a WIP as in the end a VPC paired project should CI/CD ansible but this is another story).
Inside the VM I've setup the identity of a dedicated service account via
gcloud auth activate-service-account name#project.iam.gserviceaccount.com --key-file=/path/to/sa/json/key
then I've created a pair of ssh keys
I've enabled the S.A. to login by exporting the public key via
gcloud compute os-login ssh-keys add --key-file ~/.ssh/my-sa-public-key
I run all my playbooks from within the VM passing the -u switch to ansible. This is blazing fast and let me revoke any permission via IAM avoiding floating ssh keys abandoned into project or VM metadata.
So the flow now is:
I use IAP to login from my workstation into the ansible VM inside the project
I clone the ansible repo inside the VM
I run ansible impersonating the S.A.
Caveats:
to get the correct username to be passed to ansible (via -u) record the username provided by the previous os-login command (it appears in the output of the added key, in my case was somethins like sa_[0-9]*)
be sure the S.A. has both Service Account User and OS Admin Login IAM roles or the ssh will fail
of course this means you have to keep a VM inside the project dedicated to ansible and also that you need to clone the ansible code into the VM. In my case, I mitigate the "issue", just switch the VM on/off on demand and I use the same public key to grant read-only access to the ansible repo (in my case on bitbucket)
I've setup Amazon's dynamic inventory for Ansible according to https://aws.amazon.com/blogs/apn/getting-started-with-ansible-and-dynamic-amazon-ec2-inventory-management/. I'm able to get an inventory of every EC2 instance on this account but I'd like to filter that down using tags. I've set instance_filters in my ec2.ini but the script still returns the entire inventory.
instance_filters = tag:environment=qa
ansible all -i ec2.py -m ping
I also made sure the environment variable to point to ec2.ini was set.
export EC2_INI_PATH=/path/to/ec2.ini/its/different/on/my/machine/I/swear
What steps/configs am I missing that actually filters EC2 instances?
The instance_filters config was working as expected. The problem was that the extra "hosts" I was picking up were actually ElasiCache clusters. In order to exclude those from inventory I had to add the config below to ec2.ini.
elasticache = False
I am using AWS code deploy agent and deploying my project to the server through bitbucket plugin.
The code deployment agent first executes the script files which has the command to execute my spring-boot project.
Since I have two environments one development and another production. I want the script to do things differently based on the environment i.e two different instances.
My plan is to fetch the aws static ip-address which is mapped and from that determine the environment
(production or stage).
How to fetch the elastic ip address through sh commands.
edited
Static IP will work.
Here is a more nature CodeDeploy way to solve this is - set up 2 CodeDeploy deployment groups, one for your development env and the other for your production env. Then in your script, you can use environment variables that CodeDeploy will set during the deployment to understand which env you are deploying to.
Here is a blog post about how to use CodeDeploy environment variables: https://aws.amazon.com/blogs/devops/using-codedeploy-environment-variables/
You could do the following:
id=$( curl http://169.254.169.254/latest/meta-data/instance-id )
eip=$( aws ec2 describe-addresses --filter Name=instance-id,Values=${id} | aws ec2 describe-addresses | jq .Addresses[].PublicIp --raw-output )
The above gets the instance-id from metadata, then uses the aws cli to look for elastic IPs filtered by the id from metadata. Using jq this output can then be parsed down to the IP you are looking for.
Query the metadata server
eip=`curl -s 169.254.169.254/latest/meta-data/public-ipv4`
echo $eip
The solution is completely off tangent to what I originally asked but it was enough for my requirement.
I just needed to know the environment I am in to do certain actions. So what I did was to set an environment variable in an independent script file where the environment variable is set and the value is that of the environment.
ex: let's say in a file env-variables.sh
export profile= stage
In the script file where the commands have to be executed based on the environment I access it this way
source /test/env-variables.sh
echo current profile is $profile
if [ $profile = stage ]
then
echo stage
elif [ $profile = production ]
then
echo production
else
echo failure
fi
Hope some one finds it useful
we're using Ansible to provision ec2 instances, deploy our application on them and then create an AMI based on that instance, update the launch config so the autoscaling group always launch a new version of the app.
we re-use much of the ansible's playbooks for different Apps (diff instance sizes, families, etc).
one of the tasks that this runs is to check the instance meta-data to find out if there are any ephemeral devices present, and if so, mount them and persist them in the /etc/fstab and in the cloud-config.
we run into an issue with the instance meta-data similar to what was described 3 years ago in this thread:
https://forums.aws.amazon.com/thread.jspa?messageID=489889
right now we're doing some testing with t2.large instance, as you know, this instances are ebs only, but when we curl the instance metadata we get that there are 2 ephemeral disks presents:
ubuntu#ip-xxx-xxx-xxx-xxx:~$ curl http://169.254.169.254/latest/meta-data/block-device-mapping/
ami
ebs1
ebs2
ephemeral0
ephemeral1
the thing is that those ephemeral devices don't actually exists, so when our script tries to persist them in cloud-config, cloud-config ends up as a broken yaml blob....this is a major issue, because it causes the user_data to fail when an instance is launching.
any ideas anyone?
If this is a known issue that amazon team is not going to fix, you can workaround it.
Quick script to check actual existence:
#!/bin/bash
for MAPID in $(curl -s http://169.254.169.254/latest/meta-data/block-device-mapping/); do
BLKDEV=$(curl -s http://169.254.169.254/latest/meta-data/block-device-mapping/$MAPID/ | sed 's#^/dev/##' | sed 's/^sd/xvd/')
if blkid | grep -q $BLKDEV; then
echo $BLKDEV present
else
echo $BLKDEV not present
fi
done
Checked it with c1.medium and t2.medium. t2 output:
$ curl -s http://169.254.169.254/latest/meta-data/block-device-mapping/
ami
ebs1
ephemeral0
ephemeral1
root
$ ./test.sh
xvda1 present
xvdd present
xvdb not present
xvdc not present
xvda1 present
What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes