user data scripts fails without giving reason - amazon-web-services

I am starting a Amazon Linux instance (ami-fb8e9292) using the web console, pasting data into the user data box to run a script upon startup. If I use the example given by amazon to start a web server, it works. But when I run my own script (also a #!/bin/bash script), it does not get run.
If I look in var/log/cloud-init.log, it gives no useful information on the topic:
May 22 21:06:12 cloud-init[1286]: util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/part-001'] with allowed return codes [0] (shell=True, capture=False)
May 22 21:06:16 cloud-init[1286]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [2]
May 22 21:06:16 cloud-init[1286]: util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/part-001 [2]
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/cloudinit/util.py", line 637, in runparts
subp([exe_path], capture=False, shell=True)
File "/usr/lib/python2.6/site-packages/cloudinit/util.py", line 1528, in subp
cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/part-001']
Exit code: 2
Reason: -
Stdout: ''
Stderr: ''
If I ssh into the instance and sudo su and execute the shell script directly:
/var/lib/cloud/instance/scripts/part-001
then it runs fine. Also, it works if I emulate the way cloud-init runs it:
python
>>> import cloudinit.util
>>> cloudinit.util.runparts("/var/lib/cloud/instance/scripts/")
Using either of those methods, if I intentionally introduce errors into the script then it produces error messages. How can I debug the selective absence of useful debugging output?

Instead of /var/log/cloud-init.log consider searcing for keywords like "Failed", "ERROR" "WARNING" or "/var/lib/cloud/instance/scripts/" inside /var/log/cloud-init-output.log - which in most cases, contains very clear error messages.
For example - running a bad command will produce the following error in /var/log/cloud-init-output.log:
/var/lib/cloud/instance/scripts/part-001: line 10: vncpasswd: command not found
cp: cannot stat '/lib/systemd/system/vncserver#.service': No such file or directory
sed: can't read /etc/systemd/system/vncserver#.service: No such file or directory
Failed to execute operation: No such file or directory
Failed to start vncserver#:1.service: Unit not found.
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Cleaning repos: amzn2-core amzn2extra-docker amzn2extra-epel
And at the end of /var/log/cloud-init.log you'll receive a quiet general error message:
Aug 31 15:14:00 cloud-init[3532]: util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 910, in runparts
subp(prefix + [exe_path], capture=False, shell=True)
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 2105, in subp
cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/part-001']
Exit code: 1
Reason: -
Stdout: -
Stderr: -
cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
(*) Try to grep just the relevant error message with:
grep -C 10 '<search-keyword>' cloud-init-output.log

I'm not sure if this is going to be the case for everyone, but I was having this issue and was able to fix it by changing my first line from this:
#!/bin/bash -e -v
to just this:
#!/bin/bash
Of course, now my script is failing and I have no idea how far it's getting, but at least I got past it not running it at. :)

Hope it will reduce the debugging time for someone.
I didn't have any explicit error messages in my /var/log/cloud-init-output.log, just this:
2021-04-07 10:36:57,748 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2021-04-07 10:36:57,748 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
After some investigation, I've realized that the cause was a typo in the shebang string: #!?bin/bash instead of #!/bin/bash.

I had a similar issue and I was able to get around it. I realized that the environment variables EC2_HOME would not be setup for the sudo. I was doing a bunch of stuff in my configset which uses aws cli and for these to work, the EC2_HOME needs to be setup. So, I went in and removed sudo everywhere in my configset and UserData.
Earlier when I was hitting the issue, my UserData looked like:
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash\n",
"sudo yum update -y aws-cfn-bootstrap\n",
"# Install the files and packages and run the commands from the metadata\n",
"sudo /opt/aws/bin/cfn-init -v --access-key ", { "Ref" : "IAMUserAccessKey" }, " --secret-key ", { "Ref" : "SecretAccessKey" },
" --stack ", { "Ref" : "AWS::StackName" },
" --resource NAT2 ",
" --configsets config ",
" --region ", { "Ref" : "AWS::Region" }, "\n"
]]}}
My UserData after the changes looked like:
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"yum update -y aws-cfn-bootstrap\n",
"# Install the files and packages and run the commands from the metadata\n",
"/opt/aws/bin/cfn-init -v --access-key ", { "Ref" : "IAMUserAccessKey" }, " --secret-key ", { "Ref" : "SecretAccessKey" },
" --stack ", { "Ref" : "AWS::StackName" },
" --resource NAT2 ",
" --configsets config ",
" --region ", { "Ref" : "AWS::Region" }, "\n"
]]}}
Similarly, I removed all the sudo calls I was doing in my configsets

In my case cloudinit could not start script because userdata must start with
#!bin/bash
without empty spaces in front of it!
Nice AWS bug, lot of time for troubleshooting :)

I've been through this, and in my case it was also an issue with spaces before the she-bang #!bin/bash.
I spun up an instance through python code, using boto3.
ec2 = boto3.resource('ec2', region_name='eu-south-1')
instance = ec2.create_instances(
image=AMI_IMAGE_ID,
InstanceType=INSTANCE_TYPE,
...
UserData=USER_DATA_SCRIPT
...
)
where the definition of USER_DATA_SCRIPT was:
USER_DATA_SCRIPT = """
#!/bin/bash
apt update -y
apt upgrade -y
...
"""
This contained spaces up front, and this caused the script to generate the error without further details in /var/log/cloud-init-output.log.
Changing it into:
USER_DATA_SCRIPT = """#!/bin/bash
apt update -y
apt upgrade -y
...
"""
solved the issue.

Related

Packaging PySpark with PEX environment on dataproc

I'm trying to package a pyspark job with PEX to be run on google cloud dataproc, but I'm getting a Permission Denied error.
I've packaged my third party and local dependencies into env.pex and an entrypoint that uses those dependencies into main.py. I then gsutil cp those two files up to gs://<PATH> and run the script below.
from google.cloud import dataproc_v1 as dataproc
from google.cloud import storage
def submit_job(project_id: str, region: str, cluster_name: str):
job_client = dataproc.JobControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
operation = job_client.submit_job_as_operation(
request={
"project_id": project_id,
"region": region,
"job": {
"placement": {"cluster_name": cluster_name},
"pyspark_job": {
"main_python_file_uri": "gs://<PATH>/main.py",
"file_uris": ["gs://<PATH>/env.pex"],
"properties": {
"spark.pyspark.python": "./env.pex",
"spark.executorEnv.PEX_ROOT": "./.pex",
},
},
},
}
)
The error I get is
Exception in thread "main" java.io.IOException: Cannot run program "./env.pex": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 14 more
Should I expect packaging my environment like this to work? I don't see a way to change the permission of files included as file_uris in the pyspark job config, and I don't see any documentation on google cloud about packaging with PEX, but PySpark official docs include this guide.
Any help is appreciated - thanks!
You can always run a PEX file using a compatible interpreter. So instead of specifying a program of ./env.pex you could try python env.pex. That does not require env.pex to be executable.
I wasn't able to run the pex directly in the end, but did get a workaround working for now, which was suggested by a user in the pants slack community (thanks!)...
The workaround is to unpack the pex as a venv in a cluster initialization script.
The initialization script gsutil copied to gs://<PATH TO INIT SCRIPT>:
#!/bin/bash
set -exo pipefail
readonly PEX_ENV_FILE_URI=$(/usr/share/google/get_metadata_value attributes/PEX_ENV_FILE_URI || true)
readonly PEX_FILES_DIR="/pexfiles"
readonly PEX_ENV_DIR="/pexenvs"
function err() {
echo "[$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*" >&2
exit 1
}
function install_pex_into_venv() {
local -r pex_name=${PEX_ENV_FILE_URI##*/}
local -r pex_file="${PEX_FILES_DIR}/${pex_name}"
local -r pex_venv="${PEX_ENV_DIR}/${pex_name}"
echo "Installing pex from ${pex_file} into venv ${pex_venv}..."
gsutil cp "${PEX_ENV_FILE_URI}" "${pex_file}"
PEX_TOOLS=1 python "${pex_file}" venv --compile "${pex_venv}"
}
function main() {
if [[ -z "${PEX_ENV_FILE_URI}" ]]; then
err "ERROR: Must specify PEX_ENV_FILE_URI metadata key"
fi
install_pex_into_venv
}
main
To start the cluster and run the initialization script to unpack the pex into a venv on the cluster:
from google.cloud import dataproc_v1 as dataproc
def start_cluster(project_id: str, region: str, cluster_name: str):
cluster_client = dataproc.ClusterControllerClient(...)
operation = cluster_client.create_cluster(
request={
"project_id": project_id,
"region": region,
"cluster": {
"project_id": project_id,
"cluster_name": cluster_name,
"config": {
"master_config": <CONFIG>,
"worker_config": <CONFIG>,
"initialization_actions": [
{
"executable_file": "gs://<PATH TO INIT SCRIPT>",
},
],
"gce_cluster_config": {
"metadata": {"PEX_ENV_FILE_URI": "gs://<PATH>/env.pex"},
},
},
},
}
)
To start the job and use the unpacked pex venv to run the pyspark job:
def submit_job(project_id: str, region: str, cluster_name: str):
job_client = dataproc.ClusterControllerClient(...)
operation = job_client.submit_job_as_operation(
request={
"project_id": project_id,
"region": region,
"job": {
"placement": {"cluster_name": cluster_name},
"pyspark_job": {
"main_python_file_uri": "gs://<PATH>/main.py",
"properties": {
"spark.pyspark.python": "/pexenvs/env.pex/bin/python",
},
},
},
}
)
Following #megabits answer here is the bash based workflow that works for me
copy the init script (from answer) to GCS as gs://BUCKET/pkg/cluster-env-init.bash
build PEX providing --include-tools argument that is required by initialization script, e.g.
pex --include-tools -r requirements.txt -o env.pex
put PEX file on GCS
gsutil mv env.pex "gs://BUCKET/pkg/env.pex"
create cluster using PEX file to set-up env
gcloud dataproc clusters create your-cluster --region us-central1 \
--initialization-actions="gs://BUCKET/pkg/cluster-env-init.bash" \
--metadata "PEX_ENV_FILE_URI=gs://BUCKET/pkg/env.pex"
run job
gcloud dataproc jobs submit pyspark your-script.py \
--cluster=your-cluster --region us-central1 \
--properties spark.pyspark.python="/pexenvs/env.pex/bin/python"

Running a shell script in CloudFormation cfn-init

I am trying to run a script in the cfn-init command but it keeps timing out.
What am I doing wrong when running the startup-script.sh?
"WebServerInstance" : {
"Type" : "AWS::EC2::Instance",
"DependsOn" : "AttachGateway",
"Metadata" : {
"Comment" : "Install a simple application",
"AWS::CloudFormation::Init" : {
"config" : {
"files": {
"/home/ec2-user/startup_script.sh": {
"content": {
"Fn::Join": [
"",
[
"#!/bin/bash\n",
"aws s3 cp s3://server-assets/startserver.jar . --region=ap-northeast-1\n",
"aws s3 cp s3://server-assets/site-home-sprint2.jar . --region=ap-northeast-1\n",
"java -jar startserver.jar\n",
"java -jar site-home-sprint2.jar --spring.datasource.password=`< password.txt` --spring.datasource.username=`< username.txt` --spring.datasource.url=`<db_url.txt`\n"
]
]
},
"mode": "000755"
}
},
"commands": {
"start_server": {
"command": "./startup_script.sh",
"cwd": "~",
}
}
}
}
},
The file part works fine and it creates the file but it times out at running the command.
What is the correct way of executing a shell script?
You can tail the logs in /var/log/cfn-init.log and detect the issues while running the script.
The commands in Cloudformation Init are ran as sudo user by default. Maybe there can be an issue were your script is residing in /home/ec2-user/ and you are trying to run the script from '~' (i.e. /root).
Please give the absolute path (/home/ec2-user) in cwd. It will solve your concern.
However, the exact issue can be fetched from the logs only.
Usually the init scripts are executed by root unless specified otherwise. Can you try giving the full path while running your startup script. You can give cloudkast a try. It is an online cloudformation template generator. Makes easier creating objects such as aws::cloudformation::init.

Perform Root Valume Swap for EC2 via the AWS console

I've recently been using the Swap root volume approach for creating a persistent Spot Instance, as described here (Approach 2). Typically it takes 2-5 minutes for my Spot Instance to be fulfilled and the Swap to complete. However, some days, the process never finishes (or at least I get impatient after waiting 20 minutes to an hour!).
To be clear, the Instance is created, but the Swap never happens: I can ssh into the server but my persistent files are not there. I also can see this by going to my AWS console and noting that "spotter" (my persistent storage) has no attachment information:
As the swapping script which I'm using never gives me any errors, it's hard to see what's failing. So, I'm wondering if based on my screenshot I can just use the AWS EC2 Management Console to "manually" perform the swap, and if so, how would I accomplish this.
And, if it helps #Vorsprung,
I initiate the process by running the following script:
# The config file was created in ondemand_to_spot.sh
export config_file=my.conf
cd "$(dirname ${BASH_SOURCE[0]})"
. ../$config_file || exit -1
export request_id=`../ec2spotter-launch $config_file`
echo Spot request ID: $request_id
echo Waiting for spot request to be fulfilled...
aws ec2 wait spot-instance-request-fulfilled --spot-instance-request-ids $request_id
export instance_id=`aws ec2 describe-spot-instance-requests --spot-instance-request-ids $request_id --query="SpotInstanceRequests[*].InstanceId" --output="text"`
echo Waiting for spot instance to start up...
aws ec2 wait instance-running --instance-ids $instance_id
echo Spot instance ID: $instance_id
echo 'Please allow the root volume swap script a few minutes to finish.'
if [ "x$ec2spotter_elastic_ip" = "x" ]
then
# Non elastic IP
export ip=`aws ec2 describe-instances --instance-ids $instance_id --filter Name=instance-state-name,Values=running --query "Reservations[*].Instances[*].PublicIpAddress" --output=text`
else
# Elastic IP
export ip=`aws ec2 describe-addresses --allocation-ids $ec2spotter_elastic_ip --output text --query 'Addresses[0].PublicIp'`
fi
export name=fast-ai
if [ "$ec2spotter_key_name" = "aws-key-$name" ]
then function aws-ssh-spot {
ssh -i ~/.ssh/aws-key-$name.pem ubuntu#$ip
}
function aws-terminate-spot {
aws ec2 terminate-instances --instance-ids $instance_id
}
echo Jupyter Notebook -- $ip:8888
fi
where my.conf is:
# Name of root volume.
ec2spotter_volume_name=spotter
# Location (zone) of root volume. If not the same as ec2spotter_launch_zone,
# a copy will be created in ec2spotter_launch_zone.
# Can be left blank, if the same as ec2spotter_launch_zone
ec2spotter_volume_zone=us-west-2b
ec2spotter_launch_zone=us-west-2b
ec2spotter_key_name=aws-key-fast-ai
ec2spotter_instance_type=p2.xlarge
# Some instance types require a subnet to be specified:
ec2spotter_subnet=subnet-c9cba8af
ec2spotter_bid_price=0.55
# uncomment and update the value if you want an Elastic IP
# ec2spotter_elastic_ip=eipalloc-64d5890a
# Security group
ec2spotter_security_group=sg-2be79356
# The AMI to be used as the pre-boot environment. This is NOT your target system installation.
# Do Not Modify this unless you have a need for a different Kernel version from what's supplied.
# ami-6edd3078 is ubuntu-xenial-16.04-amd64-server-20170113
ec2spotter_preboot_image_id=ami-bc508adc
and the ec2spotter-launch script is:
#!/bin/bash
# "Phase 1" this is the user-facing script for launching a new spot istance
if [ "$1" = "" ]; then echo "USER ERROR: please specify a configuration file"; exit -1; fi
cd $(dirname $0)
. $1 || exit -1
# New instance:
# Desired launch zone
LAUNCH_ZONE=$ec2spotter_launch_zone
# Region is LAUNCH_ZONE minus the last character
LAUNCH_REGION=$(echo $LAUNCH_ZONE | sed -e 's/.$//')
PUB_KEY=$ec2spotter_key_name
# Existing Volume:
# If no volume zone
if [ "$ec2spotter_volume_zone" = "" ]
then # Use instance zone
ec2spotter_volume_zone=$LAUNCH_ZONE
fi
# Name of volume (find it by name later)
ROOT_VOL_NAME=$ec2spotter_volume_name
# zone of volume (needed if different than instance zone)
ROOT_ZONE=$ec2spotter_volume_zone
# Region is Zone minus the last character
ROOT_REGION=$(echo $ROOT_ZONE | sed -e 's/.$//')
#echo "ROOT_VOL_NAME=${ROOT_VOL_NAME}; ROOT_ZONE=${ROOT_ZONE}; ROOT_REGION=${ROOT_REGION}; "
#echo "LAUNCH_ZONE=${LAUNCH_ZONE}; LAUNCH_REGION=${LAUNCH_REGION}; PUB_KEY=${PUB_KEY}"
AWS_ACCESS_KEY=`aws configure get aws_access_key_id`
AWS_SECRET_KEY=`aws configure get aws_secret_access_key`
aws ec2 describe-volumes \
--filters Name=tag-key,Values="Name" Name=tag-value,Values="$ROOT_VOL_NAME" \
--region ${ROOT_REGION} --output=json > volumes.tmp || exit -1
ROOT_VOL=$(jq -r '.Volumes[0].VolumeId' volumes.tmp)
ROOT_TYPE=$(jq -r '.Volumes[0].VolumeType' volumes.tmp)
#echo "ROOT_TYPE=$ROOT_TYPE; ROOT_VOL=$ROOT_VOL";
if [ "$ROOT_VOL_NAME" = "" ]
then
echo "root volume lacks a Name tag";
exit -1;
fi
cat >user-data.tmp <<EOF
#!/bin/sh
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
echo AWSSecretKey=$AWS_SECRET_KEY >> /root/.aws.creds
apt-get update
apt-get install -y jq
apt-get install -y python-pip python-setuptools
apt-get install -y git
pip install awscli
cd /root
git clone --depth=1 https://github.com/slavivanov/ec2-spotter.git
echo Got spotter scripts from github.
cd ec2-spotter
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
userData=$(base64 user-data.tmp | tr -d '\n');
cat >specs.tmp <<EOF
{
"ImageId" : "$ec2spotter_preboot_image_id",
"InstanceType": "$ec2spotter_instance_type",
"KeyName" : "$PUB_KEY",
"EbsOptimized": true,
"Placement": {
"AvailabilityZone": "$LAUNCH_ZONE"
},
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 128
}
}
],
"NetworkInterfaces": [
{
"DeviceIndex": 0,
"SubnetId": "${ec2spotter_subnet}",
"Groups": [ "${ec2spotter_security_group}" ],
"AssociatePublicIpAddress": true
}
],
"UserData" : "${userData}"
}
EOF
SPOT_REQUEST_ID=$(aws ec2 request-spot-instances --launch-specification file://specs.tmp --spot-price $ec2spotter_bid_price --output="text" --query="SpotInstanceRequests[*].SpotInstanceRequestId" --region ${LAUNCH_REGION})
echo $SPOT_REQUEST_ID
# Clean up
rm user-data.tmp
rm specs.tmp
rm volumes.tmp
This is not an exact answer, but it may help you to find the way to debug the issue.
As I understand, this is the part of your setup is in the ec2spotter-launch script responsible for volume swap:
...
cat >specs.tmp <<EOF
{
"ImageId" : "$ec2spotter_preboot_image_id",
...
"UserData" : "${userData}"
}
EOF
SPOT_REQUEST_ID=$(aws ec2 request-spot-instances --launch-specification file://specs.tmp --spot-price $ec2spotter_bid_price --output="text" --query="SpotInstanceRequests[*].SpotInstanceRequestId" --region ${LAUNCH_REGION})
The specs.tmp is used as instance launch specification: --launc-specification file:://specs.tmp.
And the "UserData" inside the launch specification is a script which is also generated in es2spotter-launch:
cat >user-data.tmp <<EOF
#!/bin/sh
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
echo AWSSecretKey=$AWS_SECRET_KEY >> /root/.aws.creds
apt-get update
...
cd /root
git clone --depth=1 https://github.com/slavivanov/ec2-spotter.git
echo Got spotter scripts from github.
cd ec2-spotter
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
The actual work to swap the root volume is performed by the ec2spotter-remount-root script which is downloaded from github.
There are many echo statements in that script, so I think if you find where the output goes, you'll be able to understand what was wrong.
So when you have the issue, you'll ssh to the instance and check the log file.
The question is what file to check (and if the script output is being logged into some file).
Here is what I suggest to try:
Check standard logs under /var/log generated when the instance is starting (cloud-init.log, syslog, etc) to see if you can find the ec2spotter-remount-root output
Try to enable logging by yourself, something similar is discussed here
I would try modifying the user-data.tmp part in es2spotter-launch this way:
#!/bin/bash
set -x
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
...
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
Here I've changed first three lines to enable logging into /var/log/user-data.log.
If 1 and 2 don't work, I would try asking script author on github. As there are lots of echos in the script, the author should know where to look for that output.
Hope that helps, also you don't need to wait for the issue to appear to try this out, instead look for the script output on successful runs too.
Or, if you are able to make few test runs, then do that and make sure you can find the log with script output.

How to install aws-cfn-bootstrap/cfn-init package in Redhat using CloudFormation?

I am trying to launch a instance with CloudFormation Template. Instance was started but the UserData section was not executed completely because cfn-init/aws-cfn-bootstrap package was not installed in Redhat 7 AMI. I tried installing aws-cfn-bootstrap package manually but could not install due to the conflicts with python version.
Here is the UserData section of CloudFormation Template
"UserData": {
"Fn::Base64": {
"Fn::Join": [
"\n",
[
"#!/bin/bash",
"set -x",
"",
"INSTANCE_ID=`/opt/aws/bin/ec2-metadata --instance-id | cut -f2 -d' '`",
"REGION=`/opt/aws/bin/ec2-metadata --availability-zone| cut -f2 -d' ' | sed '$s/.$//'`",
{
"Fn::Join": [
"",
[
"AID='",
{
"Fn::GetAtt": [
"eip",
"AllocationId"
]
},
"'"
]
]
},
"aws ec2 associate-address --region $REGION --instance-id $INSTANCE_ID --allocation-id $AID"
]
]
}
}
cloud-init.log
Nov 12 03:55:27 localhost cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Thu, 12 Nov 2015 08:55:27 +0000. Up 19.01 seconds.
Nov 12 03:55:28 localhost cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Thu, 12 Nov 2015 08:55:27 +0000. Up 19.67 seconds.
Nov 12 03:55:28 localhost cloud-init: ++ /opt/aws/bin/ec2-metadata --instance-id
Nov 12 03:55:28 localhost cloud-init: /var/lib/cloud/instance/scripts/part-001: line 4: /opt/aws/bin/ec2-metadata: No such file or directory
Nov 12 03:55:28 localhost cloud-init: ++ cut -f2 '-d '
Nov 12 03:55:28 localhost cloud-init: + INSTANCE_ID=
Nov 12 03:55:28 localhost cloud-init: ++ cut -f2 '-d '
Nov 12 03:55:28 localhost cloud-init: ++ sed '$s/.$//'
Nov 12 03:55:28 localhost cloud-init: ++ /opt/aws/bin/ec2-metadata --availability-zone
Nov 12 03:55:28 localhost cloud-init: /var/lib/cloud/instance/scripts/part-001: line 5: /opt/aws/bin/ec2-metadata: No such file or directory
Nov 12 03:55:28 localhost cloud-init: + REGION=
Nov 12 03:55:28 localhost cloud-init: + AID=eipalloc-XXXXXX
Nov 12 03:55:28 localhost cloud-init: + aws ec2 associate-address --region --instance-id --allocation-id eipalloc-XXXXXX
Nov 12 03:55:28 localhost cloud-init: /var/lib/cloud/instance/scripts/part-001: line 7: aws: command not found
Nov 12 03:55:28 localhost cloud-init: 2015-11-12 03:55:28,078 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Nov 12 03:55:28 localhost cloud-init: 2015-11-12 03:55:28,089 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Nov 12 03:55:28 localhost cloud-init: 2015-11-12 03:55:28,089 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/pyt
hon2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Here's what worked for me, i'm not taking full credit for this as I found it on one of the aws forums but now i can't find the source link...anyway i had to modify it a bit to work for me on redhat 6 so hopefully it helps someone else as well.
This is the Userdata section in the template to install the cfn-init scripts:
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"### This is redhat 6. It supports cloud-init but Cfn packages need to be installed unlike AWS Linux. And they are installed in a different location\n",
"# First enable EPEL\n",
"rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm", "\n",
"# Now install Python Setuptools(easy_install) and Pip", "\n",
"yum -y install python-pip", "\n",
"# Now install cfn scripts", "\n",
"/usr/bin/easy_install --script-dir /opt/aws/bin https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz", "\n",
"# Now fix the cfn-hup script and copy to init.d location as AWS does not do it for you", "\n",
"cp -f `pip show aws-cfn-bootstrap 2> /dev/null | egrep \"^Location\" | awk -F \":\" ' { print $2 }'`/init/redhat/cfn-hup /etc/init.d/", "\n",
"chmod 755 /etc/init.d/cfn-hup", "\n",
"chkconfig --add cfn-hup", "\n",
"/opt/aws/bin/cfn-init -v ",
" --stack ", { "Ref" : "AWS::StackId" },
" --resource AppServer ",
" --configsets Install ",
" --region ", { "Ref" : "AWS::Region" }, "\n"
]]}}
You can install it in this way by using only pip (no need to use easy_install):
Python 2:
pip install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
Python 3:
pip3 install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-py3-latest.tar.gz
Full example with CloudFormation LaunchConfiguration:
LaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -e # Exit on error
set -u # Treat unset variable as error
set -x # Trace commands
set -o pipefail
# Install latest security updates
yum update -y
# Install pip
yum -y install python3-pip
# Install AWS CLI and CFN Bootstrap
pip3 install awscli
pip3 install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-py3-latest.tar.gz
For RHEL-7.x, you have to add the pypa-repo, below is something worked for me.
"UserData": {
"Fn::Base64": {
"Fn::Join": [
"",
[
"#!/bin/bash\n",
"## Install EPEL\n",
"rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-9.noarch.rpm\n",
"# Installing CFN-init \n",
"yum -y install python-pip",
"\n",
"cd /use/bin", "\n",
"/usr/bin/easy_install --script-dir /opt/aws/bin https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz",
"\n",
"cp -f `pip show aws-cfn-bootstrap 2> /dev/null | grep -i \"^Location\" | awk '{ print $2\"/init/redhat/cfn-hup\"}'` /etc/init.d/",
"\n",
"chmod 755 /etc/init.d/cfn-hup",
"\n",
"chkconfig --add cfn-hup",
"\n",
"## DONE \n",
"# Install the files and packages from the metadata\n",
"/opt/aws/bin/cfn-init -v ",
" --stack ",
{
"Ref": "AWS::StackName"
},
" --resource appInstance1 ",
" --region ",
{
"Ref": "AWS::Region"
},
"\n",
"# Start up the cfn-hup daemon to listen for changes to the Web Server metadata\n",
"/opt/aws/bin/cfn-hup || error_exit 'Failed to start cfn-hup'\n",
"# Signal the status from cfn-init\n",
"/opt/aws/bin/cfn-signal -e $? ",
" --stack ",
{
"Ref": "AWS::StackName"
},
" --resource appInstance1 ",
" --region ",
{
"Ref": "AWS::Region"
},
"\n"
]
]
}
}
Example for RedHat 8:
UserData:
Fn::Base64: !Sub |
#!/bin/bash -x
yum update -y
yum -y install python2-pip
pip2 install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
python2 /usr/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource MyInstance \
--region ${AWS::Region}
Where MyInstance is the name of the instance resource in your CFN.

CloudFormation error output

I have this function in my template in CloudFormation:
"function error_exit\n",
"{\n",
" cfn-signal -e 1 -r \"$1\" '", { "Ref" : "WebServerWaitHandle" }, "'\n",
" exit 1\n",
"}\n",
It prints out in the resources when an error occurs, because this line:
sudo puppet apply --verbose --debug --environment=qa /puppet.pp > /tmp/puppet.log 2>&1 || error_exit 'failed to apply puppet manifests'\n",
But i want to show what error is in the /tmp/puppet.log, something like grep Error /tmp/puppet.log