Ansible RDS module only works once? - amazon-web-services

I'm using the RDS module in Ansible two times during a play. First time, it records the CNAME. Second time, it waits until the database status is "available".
Problem I'm running into is that it works as expected the first time, but the second time, it fails.
---
- name: Wait for RDS to be out of creating state and get its CNAME
rds:
command: facts
instance_name: '{{ wp_db_instance}}'
region: "{{ region }}"
register: rds_facts
until: rds_facts.instance.status != "creating"
retries: 15
delay: 30
become: no
- name: Set Endpoint variable
set_fact: rds_db_endpoint="{{ (rds_facts.stdout|from_json).DBInstances[0].Endpoint.Address }}"
The above code works as expected. It waits for the instance to leave "creating" status and records the name.
Later on in the play, it runs the following.
---
- name: Wait for RDS instance to be in available state
rds:
command: facts
instance_name: '{{ wp_db_instance}}'
region: "{{ region }}"
register: rds_facts_available
until: rds_facts_available.instance.status == "available"
retries: 55
delay: 30
become: no
This one fails due to "'dict object' has no attribute 'instance'" which tells me the module did not return any facts. Why wouldn't it though? Is there a problem with calling the module twice?
Any help would be appreciated.

I think I found the answer. It's kind of out there and I'm not sure if this is what fixed it, but here's what I did that works.
The first module (the one that worked) is a part of YAML that's called upon by main.yml.
- name: Get RDS endpoint
include: get_rds_endpoint.yml
delegate_to: localhost
become: no
The second module is a part of a different YAML but the files name is a variable:
- name: Configure DB
include: "{{configureDBFile}}"
delegate_to: localhost
become: no
I changed "{{configureDBFile}}" to the name of the actual file and the issue went away.
I am doing regression testing of our old playbooks against the latest release so I have to wonder if an update at some point changed the way environment variables/credentials are passed down?

Change your second tasks as below
---
- name: Wait for RDS instance to be in available state
rds_instance_info:
db_instance_identifier: '{{ wp_db_instance}}'
region: "{{ region }}"
register: rds_facts_available
until: rds_facts_available.instances[0].db_instance_status == "available"
retries: 55
delay: 30
become: no

Related

Is it possible to use Ref function on option_settings in AWS?

I am using Elastic Beanstalk to deploy a worker tier environment using SQS.
In my .ebextensions I have the following file:
option_settings:
aws:elasticbeanstalk:sqsd:
WorkerQueueURL:
Ref: WorkerQueue
HttpPath: "/sqs/"
InactivityTimeout: 1650
VisibilityTimeout: 1680
MaxRetries: 1
Resources:
WorkerQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: "tpc-clients-aws-queue"
VisibilityTimeout: 1680
However, this fails with the following error:
"option_settings" in one of the configuration files failed validation. More details to follow.
Invalid option value: 'Ref=WorkerQueue' (Namespace: 'aws:elasticbeanstalk:sqsd', OptionName: 'WorkerQueueURL'): Value does not satisfy regex: '^$|^http(s)?://.+$' [Valid non empty URL starting with http(s)]
It seems that the AWSCloudFormation Ref function cannot be used in the option_settings. Can someone confirm if this is the case?
I have seen some code snippets here on StackOverflow using intrinsic functions in the option_settings, such as in the mount-config.config of this answer and also on this question. So, are these examples using an invalid syntax? Or there are some intrinsic functions or specific resources that can be used on the option_settings?
And lastly, if I cannot use the Ref function, how can I go about this?
Yes, you can reference in .ebextentions, but the syntax is a bit strange. It is shown in the docs here.
You can try something along these lines (note the various quotations marks):
option_settings:
aws:elasticbeanstalk:sqsd:
WorkerQueueURL: '`{"Ref" : "WorkerQueue"}`'
HttpPath: "/sqs/"
InactivityTimeout: 1650
VisibilityTimeout: 1680
MaxRetries: 1
Resources:
WorkerQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: "tpc-clients-aws-queue"
VisibilityTimeout: 1680
You can also use ImportValue, if you export the WorkerQueue in outputs.
Update
To check the value obtained, you can set it as an env variable, and inspect in EB console:
option_settings:
aws:elasticbeanstalk:application:environment:
SQS_NAME: '`{"Ref" : "WorkerQueue"}`'
After digging further in this issue I made some discoveries I would like to share with future readers.
Ref can be used on option_settings
As #Marcin answer states, the Ref intrinsic function can be used in the option_settings. The syntax is different though:
'`{"Ref" : "ResourceName"}`'
Using Ref on aws:elasticbeanstalk:application:environment (environment variable)
An use case of the above is to store the queue URL in an environment variable, as follows:
option_settings:
aws:elasticbeanstalk:application:environment:
QUEUE_URL: '`{"Ref" : "WorkerQueue"}`'
This will let your .sh script access the URL of the queue:
Note that if you check the Elastic Beanstalk console (Environment > Config > Software), you won't see the actual value:
Using Ref on aws:elasticbeanstalk:sqsd:WorkerQueueURL
If you try to use the following setting:
option_settings:
aws:elasticbeanstalk:sqsd:
WorkerQueueURL: '`{"Ref" : "WorkerQueue"}`'
HttpPath: "/sqs/"
It will fail:
Invalid option value: '`{"Ref" : "WorkerQueue"}`' (Namespace: 'aws:elasticbeanstalk:sqsd', OptionName: 'WorkerQueueURL'): Value does not satisfy regex: '^$|^http(s)?://.+$' [Valid non empty URL starting with http(s)]
It seems that this configuration option don't accept a reference.
Instead of creating a new queue and assign it to the sqs daemon, you can just update the queue that Elastic Beanstalk creates:
option_settings:
# SQS daemon will use default queue created by EB (AWSEBWorkerQueue)
aws:elasticbeanstalk:sqsd:
HttpPath: "/sqs/"
Resources:
# Update the queue created by EB
AWSEBWorkerQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: "tpc-clients-aws-queue"

Configuring Concourse CI to use AWS Secrets Manager

I have been trying to figure out how to configure the docker version of Concourse (https://github.com/concourse/concourse-docker) to use the AWS Secrets Manager and I added the following environment variables into the docker-compose file but from the logs it doesn't look like it ever reaches out to AWS to fetch the creds. Am I missing something or should this automatically happen when adding these environment variables under environment in the docker-compose file? Here are the docs I have been looking at https://concourse-ci.org/aws-asm-credential-manager.html
version: '3'
services:
concourse-db:
image: postgres
environment:
POSTGRES_DB: concourse
POSTGRES_PASSWORD: concourse_pass
POSTGRES_USER: concourse_user
PGDATA: /database
concourse:
image: concourse/concourse
command: quickstart
privileged: true
depends_on: [concourse-db]
ports: ["9090:8080"]
environment:
CONCOURSE_POSTGRES_HOST: concourse-db
CONCOURSE_POSTGRES_USER: concourse_user
CONCOURSE_POSTGRES_PASSWORD: concourse_pass
CONCOURSE_POSTGRES_DATABASE: concourse
CONCOURSE_EXTERNAL_URL: http://XXX.XXX.XXX.XXX:9090
CONCOURSE_ADD_LOCAL_USER: test: test
CONCOURSE_MAIN_TEAM_LOCAL_USER: test
CONCOURSE_WORKER_BAGGAGECLAIM_DRIVER: overlay
CONCOURSE_AWS_SECRETSMANAGER_REGION: us-east-1
CONCOURSE_AWS_SECRETSMANAGER_ACCESS_KEY: <XXXX>
CONCOURSE_AWS_SECRETSMANAGER_SECRET_KEY: <XXXX>
CONCOURSE_AWS_SECRETSMANAGER_TEAM_SECRET_TEMPLATE: /concourse/{{.Secret}}
CONCOURSE_AWS_SECRETSMANAGER_PIPELINE_SECRET_TEMPLATE: /concourse/{{.Secret}}
pipeline.yml example:
jobs:
- name: build-ui
plan:
- get: web-ui
trigger: true
- get: resource-ui
- task: build-task
file: web-ui/ci/build/task.yml
- put: resource-ui
params:
repository: updated-ui
force: true
- task: e2e-task
file: web-ui/ci/e2e/task.yml
params:
UI_USERNAME: ((ui-username))
UI_PASSWORD: ((ui-password))
resources:
- name: cf
type: cf-cli-resource
source:
api: https://api.run.pivotal.io
username: ((cf-username))
password: ((cf-password))
org: Blah
- name: web-ui
type: git
source:
uri: git#github.com:blah/blah.git
branch: master
private_key: ((git-private-key))
When storing parameters for concourse pipelines in AWS Secrets Manager, it must follow this syntax,
/concourse/TEAM_NAME/PIPELINE_NAME/PARAMETER_NAME`
If you have common parameters that are used across the team in multiple pipelines, use this syntax to avoid creating redundant parameters in secrets manager
/concourse/TEAM_NAME/PARAMETER_NAME
The highest level that is supported is concourse team level.
Global parameters are not possible. Thus these variables in your compose environment will not be supported.
CONCOURSE_AWS_SECRETSMANAGER_TEAM_SECRET_TEMPLATE: /concourse/{{.Secret}}
CONCOURSE_AWS_SECRETSMANAGER_PIPELINE_SECRET_TEMPLATE: /concourse/{{.Secret}}
Unless you want to change the prefix /concourse, these parameters shall be left to their defaults.
And, when retrieving these parameters in the pipeline, no changes required in the template. Just pass the PARAMETER_NAME, concourse will handle the lookup in secrets manager as per the team and pipeline name.
...
params:
UI_USERNAME: ((ui-username))
UI_PASSWORD: ((ui-password))
...

Ansible AWX step fails when using DSL for Jenkins job

I'm running a Jenkins 2.140 server on Ubuntu 16.04 as well as an Ansible AWX 1.0.7.2 server using Ansible 2.6.2.
I'm creating a job in Jenkins which runs a template on my Ansible AWX server. I've got several other Jenkins jobs that run templates that all work so I know the general configuration I am using for this is OK.
However when I create the Jenkins job using a seed job which uses the JobDSL, the job fails at the Ansible AWX step with this output:
11:50:42 [EnvInject] - Loading node environment variables.
11:50:42 Building remotely on windows-slave (excel Windows orqaheadless windows) in workspace C:\JenkinsSlave\workspace\create-ec2-instance-2
11:50:42 ERROR: Build step failed with exception
11:50:42 java.lang.NullPointerException
11:50:42 at org.jenkinsci.plugins.ansible_tower.AnsibleTower.perform(AnsibleTower.java:129)
11:50:42 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
11:50:42 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
11:50:42 at hudson.model.Build$BuildExecution.build(Build.java:206)
11:50:42 at hudson.model.Build$BuildExecution.doRun(Build.java:163)
11:50:42 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
11:50:42 at hudson.model.Run.execute(Run.java:1815)
11:50:42 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:50:42 at hudson.model.ResourceController.execute(ResourceController.java:97)
11:50:42 at hudson.model.Executor.run(Executor.java:429)
11:50:42 Build step 'Ansible Tower' marked build as failure
11:50:42 [BFA] Scanning build for known causes...
11:50:42 [BFA] No failure causes found
11:50:42 [BFA] Done. 0s
11:50:42 Started calculate disk usage of build
11:50:42 Finished Calculation of disk usage of build in 0 seconds
11:50:42 Started calculate disk usage of workspace
11:50:42 Finished Calculation of disk usage of workspace in 0 seconds
11:50:42 Finished: FAILURE
That output doesn't really give me anything to work with especially as I'm no Java expert.
I configure the Jenkins job manually and all is well. This is the config.xml for the working job (only the AWX part). Note that all of these extra variables are passed in earlier in the job as parameters:
<builders>
<org.jenkinsci.plugins.ansible__tower.AnsibleTower plugin="ansible-tower#0.9.0">
<towerServer>AWX Server</towerServer>
<jobTemplate>create-ec2-instance</jobTemplate>
<extraVars>
key_name: ${key_name} ec2_termination_protection: ${ec2_termination_protection} vpc_subnet_id: ${vpc_subnet_id} security_groups: ${security_groups} instance_type: ${instance_type} instance_profile_name: ${instance_profile_name} assign_public_ip: ${assign_public_ip} region: ${region} image: ${image} instance_tags: ${instance_tags} ec2_wait_for_create: ${ec2_wait_for_create} ec2_wait_for_create_timeout: ${ec2_wait_for_create_timeout} exact_count: ${exact_count} delete_volume_on_termination: ${delete_volume_on_termination} data_disk_size: ${data_disk_size} private_domain: ${private_domain} route53_private_record_ttl: ${route53_private_record_ttl} dns_record: ${dns_record} elastic_ip: ${elastic_ip}
</extraVars>
<jobTags/>
<skipJobTags/>
<jobType>run</jobType>
<limit/>
<inventory/>
<credential/>
<verbose>true</verbose>
<importTowerLogs>true</importTowerLogs>
<removeColor>false</removeColor>
<templateType>job</templateType>
<importWorkflowChildLogs>false</importWorkflowChildLogs>
</org.jenkinsci.plugins.ansible__tower.AnsibleTower>
</builders>
And the config.xml from the failing, JobDSL-generated job, which looks the same to me:
<builders>
<org.jenkinsci.plugins.ansible__tower.AnsibleTower>
<towerServer>AWX Server</towerServer>
<jobTemplate>create-ec2-instance</jobTemplate>
<jobType>run</jobType>
<templateType>job</templateType>
<extraVars>
key_name: ${key_name} ec2_termination_protection: ${ec2_termination_protection} vpc_subnet_id: ${vpc_subnet_id} security_groups: ${security_groups} instance_type: ${instance_type} instance_profile_name: ${instance_profile_name} assign_public_ip: ${assign_public_ip} region: ${region} image: ${image} instance_tags: ${instance_tags} ec2_wait_for_create: ${ec2_wait_for_create} ec2_wait_for_create_timeout: ${ec2_wait_for_create_timeout} exact_count: ${exact_count} delete_volume_on_termination: ${delete_volume_on_termination} data_disk_size: ${data_disk_size} private_domain: ${private_domain} route53_private_record_ttl: ${route53_private_record_ttl} dns_record: ${dns_record} elastic_ip: ${elastic_ip}
</extraVars>
<verbose>true</verbose>
<importTowerLogs>true</importTowerLogs>
</org.jenkinsci.plugins.ansible__tower.AnsibleTower>
</builders>
So there are some expected differences you always get with JobDSL-generated jobs, such as the empty fields are missing, but this is the case with all of our (successful) other jobs that follow this process.
The JobDSL script is here:
configure { project ->
project / 'builders ' << 'org.jenkinsci.plugins.ansible__tower.AnsibleTower' {
towerServer 'AWX Server'
jobTemplate ('create-ec2-instance')
templateType 'job'
jobType 'run'
extraVars('''key_name: ${key_name}
ec2_termination_protection: ${ec2_termination_protection}
vpc_subnet_id: ${vpc_subnet_id}
security_groups: ${security_groups}
instance_type: ${instance_type}
instance_profile_name: ${instance_profile_name}
assign_public_ip: ${assign_public_ip}
region: ${region} image: ${image}
instance_tags: ${instance_tags}
ec2_wait_for_create: ${ec2_wait_for_create}
ec2_wait_for_create_timeout: ${ec2_wait_for_create_timeout}
exact_count: ${exact_count}
delete_volume_on_termination: ${delete_volume_on_termination}
data_disk_size: ${data_disk_size}
private_domain: ${private_domain}
route53_private_record_ttl: ${route53_private_record_ttl}
dns_record: ${dns_record}
elastic_ip: ${elastic_ip}''')
verbose 'true'
importTowerLogs 'true'
}
}
The job that this generates looks identical in the UI (as well as the XML) to my eye, and yet I keep getting that failure when I run it. Clearly I'm missing something but I can't for the life if me see what.
Despite the fact that other AWX jobs build without this, I added the missing (empty) fields, and the job started succeeding.
So, change my JobDSL script to this:
configure { project ->
project / 'builders ' << 'org.jenkinsci.plugins.ansible__tower.AnsibleTower' {
towerServer 'AWX Server'
jobTemplate ('create-ec2-instance')
extraVars('''key_name: ${key_name}
ec2_termination_protection: ${ec2_termination_protection}
vpc_subnet_id: ${vpc_subnet_id}
security_groups: ${security_groups}
instance_type: ${instance_type}
instance_profile_name: ${instance_profile_name}
assign_public_ip: ${assign_public_ip}
region: ${region}
image: ${image}
instance_tags: ${instance_tags}
ec2_wait_for_create: ${ec2_wait_for_create}
ec2_wait_for_create_timeout: ${ec2_wait_for_create_timeout}
exact_count: ${exact_count}
delete_volume_on_termination: ${delete_volume_on_termination}
data_disk_size: ${data_disk_size}
private_domain: ${private_domain}
route53_private_record_ttl: ${route53_private_record_ttl}
dns_record: ${dns_record}
elastic_ip: ${elastic_ip}''')
jobTags ''
skipJobTags ''
jobType 'run'
limit ''
inventory ''
credential ''
verbose 'true'
importTowerLogs 'true'
removeColor ''
templateType 'job'
importWorkflowChildLogs ''
And now working as expected.

Ansible docker_container 'no Host in request URL', docker pull works correctly

I'm trying to provision my infrastructure on AWS using Ansible playbooks. I have the instance, and am able to provision docker-engine, docker-py, etc. and, I swear, yesterday this worked correctly and I haven't changed the code since.
The relevant portion of my playbook is:
- name: Ensure AWS CLI is available
pip:
name: awscli
state: present
when: aws_deploy
- block:
- name: Add .boto file with AWS credentials.
copy:
content: "{{ boto_file }}"
dest: ~/.boto
when: aws_deploy
- name: Log in to docker registry.
shell: "$(aws ecr get-login --region us-east-1)"
when: aws_deploy
- name: Remove .boto file with AWS credentials.
file:
path: ~/.boto
state: absent
when: aws_deploy
- name: Create docker network
docker_network:
name: my-net
- name: Start Container
docker_container:
name: example
image: "{{ docker_registry }}/example"
pull: true
restart: true
network_mode: host
volumes:
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/timezone
My {{ docker_registry }} is set to my-acct-id.dkr.ecr.us-east-1.amazonaws.com and the result I'm getting is:
"msg": "Error pulling my-acct-id.dkr.ecr.us-east-1.amazonaws.com/example - code: None message: Get http://: http: no Host in request URL"
However, as mentioned, this worked correctly last night. Since then I've made some VPC/subnet changes, but I'm able to ssh to the instance, and run docker pull my-acct-id.dkr.ecr.us-east-1.amazonaws.com/example with no issues.
Googling has led me not very far as I can't seem to find other folks with the same error. I'm wondering what changed, and how I can fix it! Thanks!
EDIT: Versions:
ansible - 2.2.0.0
docker - 1.12.3 6b644ec
docker-py - 1.10.6
I had the same problem. Downgrading docker-compose pip image on that host machine from 1.9.0 to 1.8.1 solved the problem.
- name: Install docker-compose
pip: name=docker-compose version=1.8.1
Per this thread: https://github.com/ansible/ansible-modules-core/issues/5775, the real culprit is requests. This fixes it:
- name: fix requests
pip: name=requests version=2.12.1 state=forcereinstall

Modify PYTHONPATH via Ansible for supervisorctl managed Python application

I am provisioning a server with a Django Stack via Ansible and getting the app from bitbucket, I am using https://github.com/jcalazan/ansible-django-stack, but I have had to tweak it a bit in order to make it work with a private bitbucket repo.
Now it's authenticating correctly but giving me the following error
failed: [default] => {"failed": true} msg: youtubeadl: ERROR (not
running) youtubeadl: ERROR (abnormal termination)
When performing this task:
- name: Restart Supervisor
supervisorctl: name={{ application_name }} state=restarted
Reading gunicorn ERROR (abnormal termination), I would like to add the project to the PYTHONPATH, any ideas how to approach this with an Ansible task, or am I missing something?
Thanks
PYTHONPATH is just another environment variable, so you can use the best practices explained in the FAQ. If it's only needed for the one task, it'd look something like:
- name: Restart Supervisor
supervisorctl: name={{ application_name }} state=restarted
environment:
PYTHONPATH: "{{ ansible_env.PYTHONPATH }}:/my/path"
Something changed. I tried this answer above, but doesn't works. After some digging, and trying:
- name: Restart Supervisor
supervisorctl: name={{ application_name }} state=restarted
environment:
PYTHONPATH: "{{ ansible_env.PATH }}:/my/path"
This should be correct answer.