Mounting an elastic file system to AWS Batch Computer Enviroment - amazon-web-services

I'm trying to get my elastic file system (EFS) to be mounted in my docker container so it can be used with AWS batch. Here is what I did:
Create a new AMI that is optimized for Elastic Container Services (ECS). I followed this guide here to make sure it had ECS on it. I also put the mount into /etc/fstab file and verified that my EFS was being mounted (/mnt/efs) after reboot.
Tested an EC2 instance with my new AMI and verified I could pull the docker container and pass it my mount point via
docker run --volume /mnt/efs:/home/efs -it mycontainer:latest
Interactively running the docker image shows me my data inside efs
Set up a new compute enviorment with my new AMI that mounts EFS on boot.
Create a JOB definition File:
{
"jobDefinitionName": "MyJobDEF",
"jobDefinitionArn": "arn:aws:batch:us-west-2:#######:job-definition/Submit:8",
"revision": 8,
"status": "ACTIVE",
"type": "container",
"parameters": {},
"retryStrategy": {
"attempts": 1
},
"containerProperties": {
"image": "########.ecr.us-west-2.amazonaws.com/mycontainer",
"vcpus": 1,
"memory": 100,
"command": [
"ls",
"/home/efs",
],
"volumes": [
{
"host": {
"sourcePath": "/mnt/efs"
},
"name": "EFS"
}
],
"environment": [],
"mountPoints": [
{
"containerPath": "/home/efs",
"readOnly": false,
"sourceVolume": "EFS"
}
],
"ulimits": []
}
}
Run Job, view log
Anyway, while it does not say "no file /home/efs found" it does not list anything in my EFS which is populated, which I'm inerpreting as the container mounting an empty efs. What am I doing wrong? Is my AMI not mounting the EFS in the compute environment?

I covered this in a recent blog post
https://medium.com/arupcitymodelling/lab-note-002-efs-as-a-persistence-layer-for-aws-batch-fcc3d3aabe90
You need to set up a launch template for your batch instances, and you need to make sure that your subnets/security groups are configured properly.

Related

AWS Elastic Beanstalk EFS Mount Error: unknown filesystem type 'efs'

I am trying to mount my EFS to a multi-docker Elastic Beanstalk environment using task definition with Dockerrun.aws.json. Also, I have configured the security group of EFS to accept NFS traffic from EC2 (EB environment) security group.
However, I am facing with the error:
ECS task stopped due to: Error response from daemon: create
ecs-awseb-SeyahatciBlog-env-k3k5grsrma-2-wordpress-88eff0a5fc88f9ae7500:
VolumeDriver.Create: mounting volume failed: mount: unknown filesystem
type 'efs'.
I am uploading this Dockerrun.aws.json file using AWS management console:
{
"AWSEBDockerrunVersion": 2,
"authentication": {
"bucket": "seyahatci-docker",
"key": "index.docker.io/.dockercfg"
},
"volumes": [
{
"name": "wordpress",
"efsVolumeConfiguration": {
"fileSystemId": "fs-d9689882",
"rootDirectory": "/blog-web-app/wordpress",
"transitEncryption": "ENABLED"
}
},
{
"name": "mysql-data",
"efsVolumeConfiguration": {
"fileSystemId": "fs-d9689882",
"rootDirectory": "/blog-db/mysql-data",
"transitEncryption": "ENABLED"
}
}
],
"containerDefinitions": [
{
"name": "blog-web-app",
"image": "bireysel/seyehatci-blog-web-app",
"memory": 256,
"essential": false,
"portMappings": [
{"hostPort": 80, "containerPort": 80}
],
"links": ["blog-db"],
"mountPoints": [
{
"sourceVolume": "wordpress",
"containerPath": "/var/www/html"
}
]
},
{
"name": "blog-db",
"image": "mysql:5.7",
"hostname": "blog-db",
"memory": 256,
"essential": true,
"mountPoints": [
{
"sourceVolume": "mysql-data",
"containerPath": "/var/lib/mysql"
}
]
}
]
}
AWS Configuration Screenshots:
EC2 Security Group (Automatically created by EB)
EFS Security Group
EFS networking
My Scenario:
Setup some EC2s with Amazon Linux 2 AMIs.
Try to setup EFS
Had the same error when trying to mount the EFS drive.
It seems like the package WAS NOT included in the Amazon Linux 2 AMI even though the documentation specifies that it should be included.
The amzn-efs-utils package comes preinstalled on Amazon Linux and Amazon Linux 2 Amazon Machine Images (AMIs).
https://docs.aws.amazon.com/efs/latest/ug/overview-amazon-efs-utils.html
Running which amzn-efs-utils returns: no amzn-efs-utils installed.
$ which amzn-efs-utils
/usr/bin/which: no amzn-efs-utils in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin)
Fix
Install the amzn-efs-utils
sudo yum install amazon-efs-utils
After searching the entire web, I didn't encounter any solution for this problem. I contacted with AWS Support. They told me that the issue is with missing "amazon-efs-utils" extension on EC2 instances created by Elastic Beanstalk and then I fixed the error by creating a file named efs.config inside .ebextensions folder:
.ebextensions/efs.config
packages:
yum:
amazon-efs-utils: 1.2
Finally, I zipped the .ebextensions folder and my Dockerrun.aws.json file before uploading and the problem has been resolved.

AWS Batch Job stuck RUNNABLE when Launch template is configured

I have configured Step Function with AWS Batch Jobs. All configuration working well but I need to customize starting instance. For this purpose I use Launch Template service and build simple (empty) configuration based on instance type used in AWS Batch configuration. When Compute Environment is build with Launch Template, Batch Job is stuck on RUNNABLE stage. When I run AWS Batch Job without Launch Template everything works OK. Lunching instance form template also works OK. Could anyone give me any advice what is wrong or missing? Below are definitions of whole stack elements.
Launch Template definition
Compute environment details Overview
Compute environment name senet-cluster-r5ad-2xlarge-v3-4
Compute environment ARN arn:aws:batch:eu-central-1:xxxxxxxxxxx:compute-environment/senet-cluster-r5ad-2xlarge-v3-4
ECS Cluster name arn:aws:ecs:eu-central-1:xxxxxxxxxxxx:cluster/senet-cluster-r5ad-2xlarge-v3-4_Batch_3323aafe-d7a4-3cfe-91e5-c1079ee9d02e
Type MANAGED
Status VALID
State ENABLED
Service role arn:aws:iam::xxxxxxxxxxx:role/service-role/AWSBatchServiceRole
Compute resources
Minimum vCPUs 0
Desired vCPUs 0
Maximum vCPUs 25
Instance types r5ad.2xlarge
Allocation strategy BEST_FIT
Launch template lt-023ebdcd5df6073df
Launch template version $Default
Instance rolearn:aws:iam::xxxxxxxxxxx:instance-profile/ecsInstanceRole
Spot fleet role
EC2 Keypair senet-test-keys
AMI id ami-0b418580298265d5c
vpcId vpc-0917ea63
Subnets subnet-49332034, subnet-8902a7e3, subnet-9de503d1
Security groups sg-cdbbd9af, sg-047ea19daf36aa269
AWS Batch Job Definition
{
"jobDefinitionName": "senet-cluster-job-def-3",
"jobDefinitionArn": "arn:aws:batch:eu-central-1:xxxxxxxxxxxxxx:job-definition/senet-cluster-job-def-3:9",
"revision": 9,
"status": "ACTIVE",
"type": "container",
"parameters": {},
"containerProperties": {
"image": "xxxxxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/senet/batch-process:latest",
"vcpus": 4,
"memory": 60000,
"command": [],
"jobRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/AWSS3BatchFullAccess-senet",
"volumes": [],
"environment": [
{
"name": "BATCH_FILE_S3_URL",
"value": "s3://senet-batch/senet_jobs.sh"
},
{
"name": "AWS_DEFAULT_REGION",
"value": "eu-central-1"
},
{
"name": "BATCH_FILE_TYPE",
"value": "script"
}
],
"mountPoints": [],
"ulimits": [],
"user": "root",
"resourceRequirements": [],
"linuxParameters": {
"devices": []
}
}
}
For those of you who had the same problem. Here are the solution works for me. it took me a few days to figure it out.
The default AWS AMI snapshots need at least 30G of storage. When you do not have the launch template, the cloudformation will use the correct storage size.
In my case, I defined only 8G of storage in my launch template. And when the launch template is used, the jobs are stuck in runnable.
Simply change the storage in your launch template to anything bigger than 30G. It shall work.
Also, do not forget IamInstanceProfile and SecurityGroupIds are required in the launch template for the job to get started.

How to run an AWS EMR cluster on multiple subnets?

Currently we are creating instances using a config.json file from EMR to configure the cluster. This file specifies a subnet ("Ec2SubnetId").
ALL of my EMR instances end up using this subnet...how do I let it use multiple subnets?
Here is the terraform template I am pushing to S3.
{
"Applications": [
{"Name": "Spark"},
{"Name": "Hadoop"}
],
"BootstrapActions": [
{
"Name": "Step1-stuff",
"ScriptBootstrapAction": {
"Path": "s3://${artifact_s3_bucket_name}/artifacts/${build_commit_id}/install-stuff.sh",
"Args": ["${stuff_args}"]
}
},
{
"Name": "setup-cloudWatch-agent",
"ScriptBootstrapAction": {
"Path": "s3://${artifact_s3_bucket_name}/artifacts/${build_commit_id}/setup-cwagent-emr.sh",
"Args": ["${build_commit_id}"]
}
}
],
"Configurations": [
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
],
"Instances": {
"AdditionalMasterSecurityGroups": [ "${additional_master_security_group}" ],
"AdditionalSlaveSecurityGroups": [ "${additional_slave_security_group}" ],
"Ec2KeyName": "privatekey-${env}",
"Ec2SubnetId": "${data_subnet}",
"InstanceGroups": [
You cannot currently achieve what you are trying to do. EMR clusters always end up with all of their nodes in the same subnet.
Using Instance Fleets, you are indeed able to configure a set of subnets.. but at launch time, AWS will choose the best one and put all your instances there.
From the EMR Documentation, under "Use the Console to Configure Instance Fleets":
For Network, enter a value. If you choose a VPC for Network, choose a single EC2 Subnet or CTRL + click to choose multiple EC2 subnets. The subnets you select must be the same type (public or private). If you choose only one, your cluster launches in that subnet. If you choose a group, the subnet with the best fit is selected from the group when the cluster launches.

AWS ECS Service for Wordpress

I created a service for wordpress on AWS ECS with the following container definitions
{
"containerDefinitions": [
{
"name": "wordpress",
"links": [
"mysql"
],
"image": "wordpress",
"essential": true,
"portMappings": [
{
"containerPort": 0,
"hostPort": 80
}
],
"memory": 250,
"cpu": 10
},
{
"environment": [
{
"name": "MYSQL_ROOT_PASSWORD",
"value": "password"
}
],
"name": "mysql",
"image": "mysql",
"cpu": 10,
"memory": 250,
"essential": true
}
],
"family": "wordpress"
}
Then went over to the public IP and completed the Wordpress installation. I also added a few posts.
But now, when I update the service to use a an updated task definition (Updated mysql container image)
"image": "mysql:latest"
I loose all the posts created and data and Wordpress prompts me to install again.
What am i doing wrong?
I also tried to use host volumes but to no vail - creates a bind mount and a docker managed volume (Did a docker inspect on container).
So, every time I update the task it resets Wordpress.
If your container needs access to the original data each time it
starts, you require a file system that your containers can connect to
regardless of which instance they’re running on. That’s where EFS
comes in.
EFS allows you to persist data onto a durable shared file system that
all of the ECS container instances in the ECS cluster can use.
Step-by-step Instructions to Setup an AWS ECS Cluster
Using Data Volumes in Tasks
Using Amazon EFS to Persist Data from Amazon ECS Containers

Where are the volumes located when using ECS and Fargate?

I have the following setup (I've stripped out the non-important fields):
{
"ECSTask": {
"Type": "AWS::ECS::TaskDefinition",
"Properties": {
"ContainerDefinitions": [
{
"Name": "mysql",
"Image": "mysql",
"MountPoints": [{"SourceVolume": "mysql", "ContainerPath": "/var/lib/mysql"}]
}
],
"RequiresCompatibilities": ["FARGATE"],
"Volumes": [{"Name": "mysql"}]
}
}
}
It seems to work (the container does start properly), but I'm not quite sure where exactly is this volume being saved. I assumed it would be an EBS volume, but I don't see it there. I guess it's internal to my task - but in that case - how do I access it? How can I control its limits (min/max size etc)? How can I create a backup for this volume?
Thanks.
Fargate does not support persistent volumes. Any volumes created attached to fargate tasks are ephemeral and cannot be initialized from an external source or backed up, sadly.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html