Schedule Docker image to be run periodically on AWS ECS? - amazon-web-services

How do I schedule a docker image to be run periodically (hourly) using ECS and without having to use a continually running EC2 instance + cron? I have a docker image containing third party binaries and the python project.
The latter approach is not viable long-term as it's expensive for the instance to be running 24/7, while only being used for a small fraction of the day given invocation of the script only lasts ~3 minutes.

For AWS ECS cluster, it is recommended to have atleast 1 EC2 server running 24x7. Have you looked at AWS Fargate whether it can run your docker container?. Also AWS Batch?. If Fargate and AWS Batch are not possible then for your requirement, I would recommend something like this without ECS.
Build an EC2 AMI with pre-built docker and required softwares and libraries.
Have AWS Instance Scheduler to spin up a EC2 server every hour and as part of user data, start a docker container with image you mentioned.
https://aws.amazon.com/answers/infrastructure-management/instance-scheduler/
If you know your task execution time maybe 5min. After 8 or 10min then bring server down with scheduler.
Above approach will blindly start a EC2 and stop it without knowing whether your python work is done successfully. We can still improve above with Lambda and CloudFormation templates combination. Let me know your thoughts :)

Actually it's possible to schedule the launch directly in CloudWatch defining a rule, as explained in
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html
This solution is cleaner, because you will not need to worry about the execution time: once finished, the Task will just terminate and a new one will be spawned on the next cycle

Related

Migrating on-premises Python ETL scripts that feed a Splunk Forwarder from a syslog box to AWS?

I've been asked to migrate on-premises Python ETL scripts that live on a syslog box over to AWS. These scripts run as cron-jobs and output logs that a Splunk Forwarder parses and sends to our Splunk instance for indexing.
My initial idea was to deploy a Cloudwatch-triggered Lambda function that spins up an EC2 instance, runs the ETL scripts cloned to that instance (30 minutes), and then brings down the instance. Another idea was to containerize the scripts and run them as task definitions. They take approximately 30 minutes to run.
Any help moving forward would be nice; I would like to deploy this in IaaC, preferably in troposphere/boto3.
Another idea was to containerize the scripts and run them as task definitions
This is probably the best approach. You can include the splunk universal forwarder container in your task definition (ensuring both containers are configured to mount the same storage where the logs are held) to get the logs into splunk. You can schedule task execution just like lambda functions or similar. Alternatively to the forwarder container, if you can configure the logs to output to stdout/stderr instead of log files, you can just setup your docker log driver to output directly to splunk.
Assuming you don't already have a cluster with capacity to run the task, you can use a capacity provider for the ASG attached to the ECS cluster to automatically provision instances into the cluster whenever the task needs to run (and scale down after the task completes).
Or use Fargate tasks with EFS storage and you don't have to worry about cluster provisioning at all.

How can i Update container image with imagedigest parameter in aws fargate cluster with aws cli

I have running my cluster and task is running.
My need is want to update container image in running task in cluster how to do?
My Image is with latest tag and every time any new changes come will push to ecr on latest tag.
Deploying with the tag latest isn't a best practice because you loose a lot of visibility into what you are doing (e.g. scale out events where you deploy more tasks as part of a service will all end up using LATEST but will be effectively running different versions of the code, etc.).
This pontificating aside, you didn't say if you started your task(s) as standalone using the run-task API or if you started your task(s) as part of a service.
If the former, you need to stop your task and run it again. If the latter, you need to redeploy your service using the --force-new-deployment flag.

How to run resource intensive tasks with Airflow

We have a long running (3h) model training task which runs every 3 days and smaller prediction pipelines that run daily.
For both cases we use Jenkins + EC2 plugin to spin up large instances(workers) and run pipelines on them. This serves 2 purposes:
Keep pipelines isolated. So every pipeline has all resources of one instance.
We save costs. Large instance run only for several hours and not 24/7
With Jenkins + EC2 plugin I am not responsible for copying code to worker and reporting the result of the execution back. Jenkins does it under the hood.
Are there anyways to achieve the same behaviour with Airflow?
Airflow 1.10 released a host of new AWS integrations that gives you a few options for doing something like this on AWS.
https://airflow.apache.org/integration.html#aws-amazon-web-services
If you are running your task in a containerized setting, it sounds like the ECSOperator or the KubernetesPodOperator could be what you need (if you're using Kubernetes).

Scheduling the stopping/starting of an EC2 instance when not in use by a Beanstalk Deployment or an ECS task?

I have a Docker image containing Python code and third-party binary executables. There are only outbound network requests. The image must run hourly and each execution lasts ~3 minutes.
I can:
Use an EC2 instance and schedule hourly execution via cron
Create a CloudWatch Event/Rule to run an ECS Task Defintion hourly
Setup an Elastic Beanstalk environment and schedule hourly deployment of the image
In all of these scenarios, an EC2 instance is running 24/7 and I am being charged for extended periods of no usage.
How do I accomplish scheduling the starting of an existing EC2 instance hourly and the stopping of said instance after the completion of my docker image?
Here's one approach I can think of. It's very high-level, and omits some details, but conceptually it would work just fine. You'll also need to consider the Identity & Access Management (IAM) Roles used:
CloudWatch Event Rule to trigger the Step Function
AWS Step Function to trigger the Lambda function
AWS Lambda function to start up EC2 instances
EC2 instance polling the Step Functions service for Activity Tasks
Create a CloudWatch Event Rule to schedule a periodic task, using a cron expression
The Target of the CloudWatch Event Rule is an AWS Step Function
The AWS Step Function State Machine starts by triggering an AWS Lambda function, which starts the EC2 instance
The next step in the Step Functions State Machine invokes an Activity Task, representing the Docker container that needs to execute
The EC2 instance has a script running on it, which polls the Activity Task for work
The EC2 instance executes the Docker container, waits for it to finish, and sends a completion message to the Step Functions Activity Task
The script running on the EC2 instance shuts itself down
The AWS Step Function ends
Keep in mind that a potentially better option would be to spin up a new EC2 instance every hour, instead of simply starting and stopping the same instance. Although you might get better startup performance by starting an existing instance vs. launching a new instance, you'll also have to spend time to maintain the EC2 instance like a pet: fix issues if they crop up, or patch the operating system periodically. In today's world, it's a commonly accepted practice that infrastructure should be disposable. After all, you've already packaged up your application into a Docker container, so you most likely don't have overly specific expectations around which host that container is actually being executed on.
Another option would be to use AWS Fargate, which is designed to run Docker containers, without worrying about spinning up and managing container infrastructure.
AWS Step Functions
AWS Fargate
Blog: AWS Fargate: An Overview
Creating a CloudWatch Event Rule that triggers on a schedule

Can I schedule Docker to run on specific time on Amazon ECS?

I want to schedule my docker image to run on a specific time on every day or if the size of a particular folder in my Amazon S3 reached a threshold size? Is it possible? (Any of these case)
There's no build in scheduler from AWS to do this.
Anyway,
you can either run a cronjob on a different machine and use the API to start a task on ECS (http://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html)
or
you can create a Lambda function that runs your task (http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html)