How to attach an AWS EFS volume to an EC2 spot instance? - amazon-web-services

I have successfully attached an EFS volume to an EC2 instance, but I need to know how to attach the EFS volume to an EC2 spot instance.

If you start to create a non spot-instance and mark the checkboxes for attaching an efs-filesystem, this results in an autogenerated script filled into the user-data-field.
Copy this script and cancel the non-spot-instance creation.
Start to create a spot instance and just paste the script in the user-data-field you get in the create-spot-request-instance-mask.
I tried it with ubuntu-linux 20.04 and it works perfect!
Script for example looks like this (if you use another os/distribution/version it may differ):
#cloud-config
package_update: true
package_upgrade: true
runcmd:
- yum install -y amazon-efs-utils
- apt-get -y install amazon-efs-utils
- yum install -y nfs-utils
- apt-get -y install nfs-common
- file_system_id_1=my-filessystem-id
- efs_mount_point_1=/mnt/efs/fs1
- mkdir -p "${efs_mount_point_1}"
- test -f "/sbin/mount.efs" && printf "\n${file_system_id_1}:/ ${efs_mount_point_1} efs tls,_netdev\n" >> /etc/fstab || printf "\n${file_system_id_1}.efs.eu-central-1.amazonaws.com:/ ${efs_mount_point_1} nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0\n" >> /etc/fstab
- test -f "/sbin/mount.efs" && grep -ozP 'client-info]\nsource' '/etc/amazon/efs/efs-utils.conf'; if [[ $? == 1 ]]; then printf "\n[client-info]\nsource=liw\n" >> /etc/amazon/efs/efs-utils.conf; fi;
- retryCnt=15; waitTime=30; while true; do mount -a -t efs,nfs4 defaults; if [ $? = 0 ] || [ $retryCnt -lt 1 ]; then echo File system mounted successfully; break; fi; echo File system not available, retrying to mount.; ((retryCnt--)); sleep $waitTime; done;

Related

CircleCI script to test against DynamoDB Local Fails

We have a CircleCI script that manages our deployment. I wanted to allow DynamoDB local to run so that we could test our DynamoDB requests. I've tried following the answers here, here and here. I've also tries using the DynamoDB local image from Docker Hub, here. This is the closest I've gotten.
version: 2
jobs:
setup-dynamodb:
docker:
- image: openjdk:15-jdk
steps:
- setup_remote_docker:
version: 18.06.0-ce
- run:
name: run-dynamodb-local
background: true
shell: /bin/bash
command: |
curl -k -L -o dynamodb-local.tgz http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.tar.gz
tar -xzf dynamodb-local.tgz
java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -port 8000 -sharedDb
check-failed:
docker:
- image: golang:1.14.3
steps:
- checkout
- setup_remote_docker:
version: 18.06.0-ce
- attach_workspace:
at: /tmp/app/workspace
- run:
name: Install dockerize
shell: /bin/bash
command: |
yum -y update && \
yum -y install wget && \
yum install -y tar.x86_64 && \
yum clean all
wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && \
tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && \
rm dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
environment:
DOCKERIZE_VERSION: v0.3.0
- run:
name: Wait for Local DynamoDB
command: dockerize -wait tcp://localhost:8000 -timeout 1m
- run:
name: checkerr
shell: /bin/bash
command: |
ls -laF /tmp/app/workspace/
for i in $(seq 1 2); do
f=$(printf "failed%d.txt" $i)
value=$(</tmp/app/workspace/$f)
if [[ "$value" != "nil" ]]; then
echo "$f = $value"
exit 1
fi
done
The problem I'm having is that all my tests are failing with error message dial tcp 127.0.0.1:8000: connect: connection refused. I'm not sure why this is happening. Do I need to expose the port from the container?
The reason is, the first job is totally seperate to second job.
In fact, you don't need the first one, and adjust second one as below
check-failed:
docker:
- image: golang:1.14.3
- image: amazon/dynamodb-local
steps:
- setup_remote_docker:
...
...
By the way, you don't need install dynamodb every time, you can run as container as well

Automate GCP persistent disk initialization

Are there any scripts that automate persistent disks formatting and attaching to the Google Cloud VM instance, instead of doing formatting & mounting steps?
The persistent disk is created with Terraform, which also creates a VM and attaches the disk to it with the attached_disk command.
I am hoping to run a simple script on the VM instance start that would:
check if the attached disk is formatted, and format if needed with ext4
check if the disk is mounted, and mount if not
do nothing otherwise
Have you considered using a startup script on the instance (I presume you can also add a startup-script with Terraform)? You could use an if loop to discover if the disk is formatted, then if not, you could try running the formatting/mounting commands in the documentation you linked (I realise you have suggested you do not want to follow the manual steps in the documentation, but these can be integrated into the startup script to achieve the desired result).
Running the following outputs and empty string if the disk is not formatted:
sudo blkid /dev/sdb
You could therefore use this in a startup script to discover if the disk is formatted, then perform formatting/mounting if that is not the case. For example, you could use something like this (Note*** If the disk is formatted but not mounted this could be dangerous and should not be used if your use case could involve existing disks which may have already been formatted):
#!/bin/bash
if sudo blkid /dev/sdb;then
exit
else
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb; \
sudo mkdir -p /mnt/disks/newdisk
sudo mount -o discard,defaults /dev/sdb /mnt/disks/newdisk
fi
The marked answer did not work for me as the sudo blkid /dev/sdb part always returned a value (hence, true) and the script would exit.
I updated the script to check for the entry in fstab and added safety options to the script.
#!/bin/bash
set -uxo pipefail
MNT_DIR=/mnt/disks/persistent_storage
DISK_NAME=my-disk
# Check if entry exists in fstab
grep -q "$MNT_DIR" /etc/fstab
if [[ $? -eq 0 ]]; then # Entry exists
exit
else
set -e # The grep above returns non-zero for no matches & we don't want to exit then.
# Find persistent disk's drive value, prefixed by `google-`
DEVICE_NAME="/dev/$(basename $(readlink /dev/disk/by-id/google-${DISK_NAME}))"
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard $DEVICE_NAME
sudo mkdir -p $MOUNT_DIR
sudo mount -o discard,defaults $DEVICE_NAME $MOUNT_DIR
# Add fstab entry
echo UUID=$(sudo blkid -s UUID -o value $DEVICE_NAME) $MNT_DIR ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
fi
Here's the gist if you want to download it - https://gist.github.com/raj-saxena/3dcaa5c0ba0be88ed91ef3fb50d3ce85
Formatting, mounting and adding entry in /etc/fstab is necessary almost all the time. Here is a solution I came up with and might help others. This can also, for sure, be improved. I added echo commands to explain what each block does.
About disk name you could add device_name on your terraform code when you attach your disks to the instance(s) like mentioned here: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_attached_disk
device_name - (Optional) Specifies a unique device name of your choice that is reflected into the /dev/disk/by-id/google- tree of a Linux operating system running within the instance. This name can be used to reference the device for mounting, resizing, and so on, from within the instance.*
#!/bin/bash
DISKS_PATH=/dev/disk/by-id
DISKS=(disk1 disk2)
check_disks () {
for disk in "${DISKS[#]}"; do
MOUNT_DIR="/$disk"
echo "$MOUNT_DIR"
if sudo blkid $DISKS_PATH/google-${disk}; then
echo "$disk is already formatted, nothing to do"
echo "checking if $disk is present in fstab"
UUID=$(sudo blkid -s UUID -o value $DISKS_PATH/google-${disk})
grep -q "UUID=${UUID} $MOUNT_DIR" /etc/fstab
if [[ $? -eq 0 ]]; then
echo "$disk already present in fstab, continuing with checking mount"
echo "Now checking if $disk is already mounted"
grep -qs "$MOUNT_DIR" /proc/mounts
if [[ $? -eq 0 ]]; then
echo "$disk is already mounted, so doing nothing with mount"
else
echo "$disk is not mounted, so mounting it"
sudo mkdir -p $MOUNT_DIR
sudo mount -o discard,defaults $DISKS_PATH/google-${disk} $MOUNT_DIR
fi
elif [[ $? -ne 0 ]]; then
echo "$disk not present in fstab, so adding it"
echo UUID="$UUID" $MOUNT_DIR ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
echo "Now checking if $disk is already mounted"
grep -qs "$MOUNT_DIR" /proc/mounts
if [[ $? -eq 0 ]]; then
echo "$disk is already mounted, so doing nothing with mount"
else
echo "$disk is not mounted, so mounting it"
sudo mkdir -p $MOUNT_DIR
sudo mount -o discard,defaults $DISKS_PATH/google-${disk} $MOUNT_DIR
fi
fi
else
echo "Formatting ${disk}"
sudo mkfs.ext4 $DISKS_PATH/google-${disk};
echo "Creating directory for ${disk} on $MOUNT_DIR"
sudo mkdir -p $MOUNT_DIR
echo "adding $disk in fstab"
UUID=$(sudo blkid -s UUID -o value $DISKS_PATH/google-${disk})
echo UUID="$UUID" $MOUNT_DIR ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
echo "Mounting $disk"
sudo mount -o discard,defaults $DISKS_PATH/google-${disk} $MOUNT_DIR
fi
done
}
check_disks

AWS ECS tasks keep starting and stopping

I am trying to use ECS for deployment with travis.
At one point everything was working but now it stopped.
I am following this tutorial https://testdriven.io/part-five-ec2-container-service/
There are 2 tasks that keep stopping and starting.
These are the messages I see in tasks:
STOPPED (CannotStartContainerError: API error (500): oci ru)
STOPPED (Essential container in task exited)
These are the messages I see in the logs:
FATAL: could not write to file "pg_wal/xlogtemp.28": No space left on device
container_linux.go:262: starting container process caused "exec: \"./entrypoint.sh\": permission denied"
Why is ECS stopping and starting so many new tasks? This was not happening before.
This is my docker_deploy.sh from my main microservice which I am calling via travis.
#!/bin/sh
if [ -z "$TRAVIS_PULL_REQUEST" ] || [ "$TRAVIS_PULL_REQUEST" == "false" ];
then
if [ "$TRAVIS_BRANCH" == "staging" ];
then
JQ="jq --raw-output --exit-status"
configure_aws_cli() {
aws --version
aws configure set default.region us-east-1
aws configure set default.output json
echo "AWS Configured!"
}
make_task_def() {
task_template=$(cat ecs_taskdefinition.json)
task_def=$(printf "$task_template" $AWS_ACCOUNT_ID $AWS_ACCOUNT_ID)
echo "$task_def"
}
register_definition() {
if revision=$(aws ecs register-task-definition --cli-input-json "$task_def" --family $family | $JQ '.taskDefinition.taskDefinitionArn');
then
echo "Revision: $revision"
else
echo "Failed to register task definition"
return 1
fi
}
deploy_cluster() {
family="testdriven-staging"
cluster="ezasdf-staging"
service="ezasdf-staging"
make_task_def
register_definition
if [[ $(aws ecs update-service --cluster $cluster --service $service --task-definition $revision | $JQ '.service.taskDefinition') != $revision ]];
then
echo "Error updating service."
return 1
fi
}
configure_aws_cli
deploy_cluster
fi
fi
This is my Dockerfile from my users microservice:
FROM python:3.6.2
# install environment dependencies
RUN apt-get update -yqq \
&& apt-get install -yqq --no-install-recommends \
netcat \
&& apt-get -q clean
# set working directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
# add requirements (to leverage Docker cache)
ADD ./requirements.txt /usr/src/app/requirements.txt
# install requirements
RUN pip install -r requirements.txt
# add entrypoint.sh
ADD ./entrypoint.sh /usr/src/app/entrypoint.sh
RUN chmod +x /usr/src/app/entrypoint.sh
# add app
ADD . /usr/src/app
# run server
CMD ["./entrypoint.sh"]
entrypoint.sh:
#!/bin/sh
echo "Waiting for postgres..."
while ! nc -z users-db 5432;
do
sleep 0.1
done
echo "PostgreSQL started"
python manage.py recreate_db
python manage.py seed_db
gunicorn -b 0.0.0.0:5000 manage:app
I tried deleting my cluster and deregistering my tasks and restarting but ECS still continuously stops and starts new tasks now.
When it was working fine: the difference was that instead of the CMD ["./entrypoint.sh"] in my Dockerfile, I had
RUN python manage.py recreate_db
RUN python manage.py seed_db
CMD gunicorn -b 0.0.0.0:5000 manage:app
travis is passing.
The errors are right there.
You don't have enough space on your host; and the entrypoint.sh file is being denied.
Ensure your host has enough disk space (Shell in and df -h to check and expand the volume or just bring up a new instance with more space) and for the entrypoint.sh ensure that when building your image it is executable chmod +x and also is readable by the user the container is running as.
Test your containers locally first; the second error should have been caught in development instantly.
I realize this answer isn't 100% relevant to the question asked, but some googling brought me here due to the title and I figure my solution might help someone later down the line.
I also had this issue, but the reason why my containers kept restarting wasn't a lack of space or other resources, it was because I had enabled dynamic host port mapping and forgotten to update my security group as needed. What happened then is that the health checks my load balancer sent to my containers inevitably failed and ECS restarted the containers (whoops).
Dynamic Port Mapping in AWS Documentation:
https://aws.amazon.com/premiumsupport/knowledge-center/dynamic-port-mapping-ecs/
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_PortMapping.html Contents --> hostPort
tl;dr - Make sure your load balancer can health check ports 32768 - 65535.
If it's too many tasks running and they have consumed the space then you will need to shell in to the host and do the following. Don't use -f on the docker rm as that will remove the running ECS agent container
docker rm $(docker ps -aq)
Do docker ps -a
Which results in all the stopped containers which are excited, these also consumes disk space.use below command to remove those zoombie
docker rm $(docker ps -a | grep Exited | awk '{print $1}')
And also remove older images or unused images these takes more DiskStation size than containers
docker rmi -f image_name

AWS apt install error = Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)

Using Amazon Linux AMI (2017.03.1) on a p2.xlarge instance, and attempting to sudo apt install {somepackage}, I get the following error:
Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
I have already tried
sudo rm /var/lib/apt/lists/lock
and
sudo rm /var/cache/apt/archives/lock
Solution:
sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a
sudo apt install {somepackage}
First look for that process by
sudo lsof /var/lib/dpkg/lock
Then make sure the process is not running by
ps cax | grep PID #PID is the Process id e.g 1111
If PID is shown(It's running) kill it {else go to step 5} by
sudo kill -9 PID
Make sure the process is killed done by
sudo ps cax | grep PID
Then remove the locked file by
sudo rm /var/lib/dpkg/lock
sudo rm /var/lib/dpkg/lock-frontend #optional
Finally make dpkg fix its self by
sudo dpkg --configure -a
Reference From
Depends on the image you are using, it may be installing some dependencies at first run.
Try: ps aux | grep -i apt if it returns something like:
root 2531 0.0 0.0 4624 772 ? Ss 23:34 0:00 /bin/sh /usr/lib/apt/apt.systemd.daily install
root 2547 0.0 0.0 4624 1716 ? S 23:34 0:00 /bin/sh /usr/lib/apt/apt.systemd.daily lock_is_held install
or similar, you may want to wait until all necessary updates will be applied.
Note that instances can be configured to have "UserData" script to run on provisioning... That can cause things to be installed and the lock to be busy for a while.
I had a similar problem and after some investigation found the following in the "UserData" section of my instance template:
#cloud-config
package_update: true
package_upgrade: true
runcmd:
- yum install -y amazon-efs-utils
- apt-get -y install amazon-efs-utils
- yum install -y nfs-utils
- apt-get -y install nfs-common
- file_system_id_1=fs-74aa550f
- efs_mount_point_1=/mnt/efs/fs1
- mkdir -p "${efs_mount_point_1}"
- test -f "/sbin/mount.efs" && printf "\n${file_system_id_1}:/ ${efs_mount_point_1} efs tls,_netdev\n" >> /etc/fstab || printf "\n${file_system_id_1}.efs.us-east-2.amazonaws.com:/ ${efs_mount_point_1} nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0\n" >> /etc/fstab
- test -f "/sbin/mount.efs" && grep -ozP 'client-info]\nsource' '/etc/amazon/efs/efs-utils.conf'; if [[ $? == 1 ]]; then printf "\n[client-info]\nsource=liw\n" >> /etc/amazon/efs/efs-utils.conf; fi;
- retryCnt=15; waitTime=30; while true; do mount -a -t efs,nfs4 defaults; if [ $? = 0 ] || [ $retryCnt -lt 1 ]; then echo File system mounted successfully; break; fi; echo File system not available, retrying to mount.; ((retryCnt--)); sleep $waitTime; done;
This is what was causing the lock for about 80-90 seconds on startup.
I removed this script from the template and I can now immediately use apt-get.

Trivia: What is this symbol that AWS EC2 displays once you SSH into the machine?

Just really interested about this one, what the blazes is this ASCII art here?
$ ssh foo#$AWS_IP
Last login: Sat Mar 21 08:39:27 2015 from xx.xx.xx.xx
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
I need it for my sanity.
It's ASCII art that says EC2. That stands for Elastic Compute Cloud
You can make your custom ones using utility figlet.
'showfigfonts' will show you what fonts are available, the one aws uses is named "standard"
We make custom banners like aws by ebextensions. ie create file .ebextension/000update-motd.config
commands:
setup_banner:
command: |
yum erase -y update-motd
unlink /etc/motd
amazon-linux-extras install epel -y
yum-config-manager --enable epel
yum install -y figlet
# Add Motd as Beanstalk Environment Name
echo `{"Ref": "AWSEBEnvironmentName" }` | figlet -f standard > /etc/motd
# Add warning disclaimer from your code ( optional )
# cat /var/app/current/.platform/banner >> /etc/motd
test: rpm --quiet -q update-motd || [[ ! -f /etc/motd ]]
ignoreErrors: true