Using AWS Batch can a docker image be specified dynamically in a job definition? - aws-batch

I want to create jobs in AWS Batch that vary on the image that is used to launch the container. I'd like to do this without creating a different Job Definition for each image. Is it possible to parameterize the image property using job definition parameters? If not, what's the best way to achieve this or do I have to just create job definitions on the fly in my application?

I would really love this functionality as well. Sadly, it appears the current answer is no.
Batch allows parameters, but they're only for the command.
AWS Batch Parameters
You may be able to find a workaround be using a :latest tag, but then you're buying a ticket to :latest hell.
My current solution is to use my CI pipeline to update all dev job definitions using the aws cli (describe-job-definitions then register-job-definition) on each tagged commit.
To keep my infrastructure-as-code consistent, I've moved the version for batch job definitions into an environment variable that I retrieve before running any terraform commands.

Typically you make a job definition for a docker image.
However that job definition and docker can certainly do anything you've programmed it to do so it can be multi-purpose and you pass in whatever parameter or command line you would like to execute.
You can override most of the parameters in a Job definition when you submit the job.

Related

How to have a simple manual ECS deployment in CodePipeline / CodeDeploy?

Basically I would like to have a simple manual deploy step that's not directly linked to a build. For use cases, when using containers, I wouldn't like to perform a build separately per environment (eg: once my build puts an image tag in ECR, I would like to deploy that to any number of environments).
Now, I know in CodePipeline I can have a number of actions and I can precede them with manual approval.
The problem with that is that should I not want to perform the last manually approved deploy, subsequent executions will pile on - the pipeline execution doesn't complete and what comes next will just have to wait. I can set a timeout, for sure, but there are moments when 20 builds come in fast and I don't know which one of them I may want to deploy to which environment (they generally all go to some QA/staging, but some need to manually deployed to a particular dev-related environment or even to production).
Manually updating task definitions all around in ECS is tedious.
I have a solution where I can manually patch a task definition using awscli and yq but is there a way to have a simple pipeline with one step that takes a manual input (aka image tag) and either uses an ECS deploy step (the only place where you can provide a clean straight patch json to patch the task definition) or uses my yq script to deploy?

How can I pass different command line arguments to a Task in aws every time I run it?

What I want to accomplish
I want have a computationally heavy python function that:
takes a file from an S3 bucket
transforms it
then saves a new file to the S3 bucket
I have done functions like these as lambda functions before, but this one is very computationally heavy and takes a long time, so I decided it would be better to package it in a container, put it on ECS and run it through Fargate. (Forgive my use of these terms if I am doing it incorrectly, I haven't wrapped my head around these concepts yet)
... Practically:
So I want to be able to run the Task that has that image of my docker container like I run my container locally, passing arguments to it every time I run it, and these arguments are different for every run:
Run 1:
docker run python-image input_file_path_1 output_file_path_1
Run 2:
docker run python-image input_file_path_2 output_file_path_2
From what I understand, I can change the task definition to include passing arguments to the script, which seems like they are "hardcoded" into the task definition, meaning they cannot be changed for every run.
Question?
So in essence, my question is how do I run this task, either from the cli or the user interface or a lambda function, where I would be able to pass arguments dynamically, every time I run the Task?
Thank you :)
If you look at the ECS RunTask API, you'll see an overrides parameter. Inside that overrides parameter you can override things such as the container environment variables, and the command. In your instance it sounds most appropriate to pass an override of the command each time you call RunTask.
You mentioned both the CLI and AWS Lambda. For the CLI you can see the documentation here. You didn't mention what programming language you are using for AWS Lambda, but all the AWS SDKs have an ECS RunTask you can lookup in their respective documentation to see how to pass in an override.

Manage image version in ECS task definition

I saw the post How to manage versions of docker image in AWS ECS? and didn’t get a good answer for the question.
In case of updating container image version (for example , from alpine:1.0.0 to alpine 1.0.1)
What is the best practice to update the container image in the task definition? I’m using only one container per task definition.
As far as I understand there are two alternatives:
Create new revision of task definition
Create new task definition that its name contains the version of the image.
The pros of the first option are that I’m creating only one task definition, but the cons are that in case that I want to create new revision only if the definition was changed, then I need to describe the task, get the image from the container list, and then compare the version with the new image version.
Regarding the second option, is that I can see exactly if I created there is a task definition that contains my image or not. The cons are that I will create new task definition for every image version.
In both options, how should I handle the deregister logic?
Probably I missed something so would appreciate your answer.
Thanks!
I've only ever seen the first alternative (Create new revision of task definition) used. If you are using Infrastructure as Code, such as CloudFormation or Terraform, then all the cons you have listed for that are no longer present.
"In both options, how should I handle the deregister logic?"
Just update the ECS service to use the latest version of the task definition. ECS will then deploy a new task, migrate the traffic to that task, and shut down the old task. You don't need to do anything else at that point. There is no special logic you need to implement yourself to deregister anything.

It is possible to re-run a job in Google Cloud Dataflow after succeded

Maybe the question sounds stupid but I was wondering if once the job is successfully finished and having ID, is it possible to start the same job again?
Or is it necessary to create another one?
Because otherwise I would have the job with the same name throughout the list.
I just want to know if there is a way to restart it without recreating it again.
It's not possible to run the exact same job again, but you can create a new job with the same name that runs the same code. It will just have a different job ID and show up as a separate entry in the job list.
If you want to make running repeated jobs easier, you can create a template. This will let you create jobs from that template via a gcloud command instead of having to run your pipeline code.
Cloud Dataflow does have a re-start function. See SDK here. One suggested pattern (to help with deployment) is to create a template for the graph you want to repeatedly run AND execute the template.

How to automate the Updating/Editing of Amazon Data Pipeline

I want to use AWS Data Pipeline service and have created some using the manual JSON based mechanism which uses the AWS CLI to create, put and activate the pipeline.
My question is that how can I automate the editing or updating of the pipeline if something changes in the pipeline definition? Things that I can imagine changing could be schedule time, addition or removal of Activities or Preconditions, references to DataNodes, resources definition etc.
Once the pipeline is created, we cannot edit quite a few things as mentioned here in the official doc: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html#dp-edit-pipeline-limits
This makes me believe that if I want to automate the updating of pipeline then I would have to delete and re-create/activate a new pipeline? If yes, then the next question is that how can I create a automated process which identifies the previous version's ID, deletes it and creates a new one? Essentially trying to build a release management flow for this where the configuration JSON file is released and deployed automatically.
Most commands like activate, delete, list-runs, put-pipeline-definition etc. take the pipeline-id which is not known until a new pipeline created. I am unable to find anything which remains constant across updates or recreation (the unique-id and name parameters of the createpipeline command are consistent but then I can't use them for the above mentioned tasks (I need pipeline-id for that.
Of course I can try writing shell scripts which grep and search the output and try to create a script but is there any other better way? Some other info that I am missing?
Thanks a lot.
You cannot edit schedules completely or change references so creating/deleting pipelines seems to be the best way for your scenario.
You'll need the pipeline-id to delete a pipeline. Is it not possible to keep a record of that somewhere? You can have a file with the last used id stored locally or in S3 for instance.
Some other ways I can think of are:
If you have only 1 pipeline in the account you can list-pipelines and
use the only result
If you have the pipeline name you can list-pipelines and find the id