Best practices for copying container images between ECRs in different account - amazon-web-services

I wonder what are best practices for copying Docker container images from ECR to ECR in AWS.
I have to copy container images periodically between multiple ECR repositories, each placed in separate AWS account - like mirroring but with specific filters for what to copy and what to skip. I wrote a script that does this work by pulling missed images from the 'source' ECR to an EC2 VM, and pushes them to the 'target' ECR.
This works, but I am not satisfied by the performance of doing that in a single thread, and it's not network throughput limiting it but 'expences' to wrap commands, run some necessary calls to AWS, etc.
So I am thinking of rewriting the script to a multi-thread application, but I wonder if I'm inventing a bicycle, and there is some known and better solution for this task.

As it was confirmed by AWS support, there was no 'out-of-the-box' way to do this job, other than direct mirroring of entire repository.
Thus I rewrote the tool for doing it in a smarter and faster way, and published it.
https://github.com/dfad1ripe/aws-ecr-cross-account-clone
To use it, both source and destination AWS accounts should be defined as profiles in ~/.aws/credentials, and the host should be running either Docker Engine or Docker Desktop.

Related

Configure ECR as a proxy that pulls from Docker Hub

Lets say I have an EKS cluster, an EC2 instance and my local machine, I can pull images from my private ECR without any issues. But when I pull a generic image like nginx, it will come from Docker Hub straight to me. Would it be possible to redirect this pull to enter my ECR first (so that it gets scanned for vulnerabilities, and maybe even for caching purposes perhaps) and then from my ECR to where I pulled from?
If this is not possible, what would be a good alternative?
AWS container team person here. Can you clarify one thing? Would you be ok to point your manifests to ECR (acting as a hub/cache for external registries) or do you want to keep your manifests pointing to DockerHub but somewhat transparently go through ECR for caching? I am asking because we are working on the former scenario.
You can subscribe here to see the progress and leave comments.
It is not possible to redirect your request to pull generic image to ECR and then to Docker Hub.
I understand your concern to pull images from Docker Hub directly. So what you can do what we have done in our projects is:
pull generic image from Docker Hub for one time
Using that image, build your own image with any customisations you may require or not.
Publish the newly created image to your ECR repo.
Going forward use your only ECR repo to pull that image.
In this way, you will have full control on the image you have. Also, it would be more secure to pull it from your ECR repo rather then again and again using Docker Hub. Also, you can do any customisation you want.

Am I able to move my docker repositories or images from Amazon ECR directly to another ECR repository on a different account?

After many hours of research the only solution I could come up with was pulling the images one by one, then logging in to the other account and pushing the images to that one. My internet is really bad and I do not have the storage space to be doing it this way. Is there and easier way?
Yes, you must pull, tag, and push. If your internet is bad, consider spinning up an AWS EC2 spot instance for a few hours to get the migration completed quickly.
I wrote a bash script for this. You can specify the number of tagged images that will be copied.
https://gist.github.com/virtualbeck/a635ef6701991f2087384eab7edbb18b

AWS-Batch vs EC2 vs AWS Workspaces for running batch scripts to load data to Redshift

I have multiple CSV files containing data for different tables, with different file sizes varying from 1 MB to 1.5 GB. I want to process the data (replace/remove values of columns) row by row and then load the data to existing Redshift tables. This is once a day batch processing.
AWS Lambda:
Lambda has limitations of memory, hence I was not able to run process for large CSV files.
EC2: I already have EC2 instance where I am running python script to transform and load the data to redshift.
I have keep EC2 running all the time, which has all python scripts which I want to run for all tables and environment created (installing python, psycopg lib etc), leads to more cost.
AWS Batch:
I created a container image which has all the setup to run the python scripts, and pushed it to ECR.
I then set up AWS Batch job, which can take this container image and run it through ECS.
This is more optimized, I only pay for EC2 used and ECR image storage.
But all the development and unit testing I will have to do on my personal desktop and then push a container, no inline AWS service to test.
AWS Workspaces:
I am not much familiar with AWS Workspaces, but need inputs, can this also be used as aws batch to start and stop instance when required and run python scripts on that, edit or test scripts.
Also, Can I schedule it to run everyday at defined time?
I need a inputs on which service is best suitable, optimized solution for such use-case? Or It would also be great if anyone suggests a better way to use above services I mentioned in better way.
Batch is best suited for your use case. I see that your concern about batch is about the development and unit testing on your personal desktop. You can automate that process using AWS ECR, CodePipeline, CodeCommit and CodeBuild. Setup a pipeline to detect changes made to your code repo, build the image and push it to ECR. Batch can pick up the latest image from there.

Continuous Deployment of Docker Compose App to AWS/EC2

I've been trying to find an efficient way to handle continuous deployment with a Docker compose setup and AWS hosting.
So far I've looked into CodeDeploy, S3 buckets, and ECS. My application is relatively small with only 3 docker services, a Django app, NGINX, and PostgreSQL. I was unable to find any reliable information for using CodeDeploy with Docker compose and because of the small scale ECS seems impractical. I've considered an S3 bucket but that seems no better than just deploying my application with something like git or scp.
What is a standard way of handling deploying a docker compose setup on AWS? If possible I would like to use Bitbucket Pipelines or CircleCI to perform the deployment in a manually triggered step after running tests. But I've been unable to find a solution that would easily let me copy over the code (which is in a git repo on a production branch and is how I get the code onto the production server at the moment).
I would like to add some possibilities to #gasc answer
It would be better if you make a cloudformation template for deploying your EC2 resources with all required groups, auto scaling and other stuff.
Then Create the AMI with docker compose installed or any other thing you would be required for your ec2 enviroment.
Then you can use code deploy pipeline, here also aws provides private container registry may be you want to use that
Rest of the steps are same just SCP the compose file into EC2 launch
docker-compose up
command and you are done.
Let me know if you want more help I'm open for discussion
What I will do in your case is:
1 - If needed, update your docker-compose.yml file (or however you called it) to version 3 or higher, to use swarm.
2 - During your pipeline build all images needed, and push them to a registry.
3 - In your pipeline scp your compose file to a manager node.
4 - Deploy your application using swarm (docker stack deploy -c <your-docker-compose-file> your_app_name). This way you can handle rolling updates and scale easily.
Note that if you want to use multiple nodes you need to open a few ports in them
I see you mentioned that ECS might seem impractical for such a small scale - in my opinion not necesarilly. It would require of you to rewrite your docker-compose.yml into task and services definitions, but since there's not a lot of services, that shouldn't take you much time.

deploying on AWS ECS via task definitions without dockerhub

Currently I'm using task-definitions that refer to custom images in dockerhub to deploy my webapp on ECS (Amazon EC2 Container Service). Is there a way to do this without going through dockerhub i.e. build/deploy the dockerfile locally across cluster nodes?
At the moment, I can only think of sending shell commands over ssh or using a tool like ansible.
Perhaps I'm missing something totally obvious here...
This is a little late for an answered question, but I just figured this out myself. The EC2 Container Registry (ECR, Amazon's repository equivalent) is working well for me, maybe didn't exist at the time?
I build the containers locally. Tag them and push them to Amazon's ECR using the AWS CLI (later versions of which include support for ECR), and then refer to them at that location in the task definitions in ECS. Works like a charm.
http://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html
ECS is a service to run containers, not to build them. It has no native support for it, so you're not missing something obvious.
As you suggest, you could distribute a Dockerfile to the container instances and build locally, but that will actually be more difficult since the container instances must have everything needed to build the image, plus you'd have to distribute the image to the other container instances.
You could run a repository yourself and specify a different repository-url for the image parameter in your ECS task definition. You'd still be responsible for building the images and now the added burden of running a repository as well.
Sorry to be the bearer of bad news but there's not a simpler workflow for this at the moment.