Docker image different size when pushed to ECR than locally - amazon-web-services

I have a docker image that is 1.46GB on my local machine, but when this is pushed to AWS ECR (either via my local machine or via CicleCI deployment) it is only 537.05MB. I'm pretty new to Docker and to AWS, so any help in figuring out as to why this may be would be appreciated!
I have a feeling that it has not fully uploaded to ECR for whatever reason, as I am trying to use this container for a Batch job, but for some reason the same command which works when used locally does not work when used in the job definition. The command is simply python app.py, but I have also tried with absolute path python /usr/local/src/app/app.py, both of which result in [Errno 2] No such file or directory.
Commands used in my Makefile deployment are as below:
docker build --force-rm=true -t $(EXTRACTOR_IMAGE_NAME) ./extractor
docker tag $(EXTRACTOR_IMAGE_NAME) $(EXTRACTOR_ECR_IMAGE_NAME)
$(shell aws ecr get-login --no-include-email)
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/$(EXTRACTOR_ECR_REPO)
Edit 1:
I think this might be to do with the size of the base image, which is python:2.7 in this case. The base image is 914MB, plus the size of my ECR image 537.05MB = 1451.05MB, i.e. approx 1.46GB. Still not sure what the issue is with the Batch command though...
Edit 2:
I've been mounting code into my container using a volume, which is why this has been working locally. At build time I've forgotten to copy the code into the container, which I assume is the only reason why this is not working in Batch!

That could be due to how docker client acts before it pushes the image to ECR as documented:
Beginning with Docker version 1.9, the Docker client compresses image layers before pushing them to a V2 Docker registry. The output of the docker images command shows the uncompressed image size, so it may return a larger image size than the image sizes shown in the AWS Management Console.
So when you pull an image you will notice that the image layers go through three stages:
Downloading
Extraction
Completion
Regarding this command: python /usr/local/src/app/app.py are executing it while you inside /usr/local/src/app/ ? You might need to ensure this first also have you checked the command inside the container using the image before you push it as the error seems to be code related more than a docker issue

We can read the following in the the AWS ECR documentation:
Note
Beginning with Docker version 1.9, the Docker client compresses image layers before pushing them to a V2 Docker registry. The output of the docker images command shows the uncompressed image size, so it may return a larger image size than the image sizes shown in the AWS Management Console.
I suspect you'd get the sizes you expect, would you use the CLI (docker images) instead of the ECR web console.

Related

AWS ECR layer already exists when pushing a updated docker version

Scenario:
Created a new ECR private repo and pushed a new docker in it successfully.
Then used that docker image to host the application.
Now application needed some updated APIs so:
made changes to the code.
followed AWS commands to push the updated docker image in the same repo with different version number.
Also while creating docker I removed the the local docker images from my pc and used the --no-cache tag as well.
While uploading I got Layers already exits for all layers except 1 (with size around 32 mb), it got pushed in 10 seconds.
On ECR I can see the new updated version of the image with size same as the previous version but only around 32 mb layer was pushed in 5,10 seconds.
What does this mean?
I tried the whole process a couple of times and randomly one time it said Layer already exists for all layers expect 2. This time a layer with proper image size (around 330 mb) got pushed.
Could someone explain what is the best way to update your version on ECR with proper image pushes?
this is not a big concern because when we talk about docker image it has multiple build layers, so when you are pushing it on aws ecr it pushes multiple layers of image in ecr repository,and when you are update your changes and build this image and pushed new version of image in ecr it only pushes the updated layer of docker image, because other layers are already available in your ecr repository. I hope you understand this.

Why container-optimized compute instance uses cached image instead of the latest one?

I'm running a container-optimized compute instance with this startup-script:
#!/bin/bash
mkdir /home/my-app
cd /home/my-app
export HOME=/home/my-app
docker-credential-gcr configure-docker
docker run --rm --security-opt seccomp=./config.json gcr.io/my-project/my-app:latest
This scripts works well when creating a new instance. But when I restart an existing instance it doesnt't pull the latest image.
I've tried to delete all images from the gcr, the instance was able to start anyways, which proves that it doesn't even try to pull the latest image from gcr.
Also, for some reason startup-script logs are not showing up in Cloud Logger.
As per kubernetes, With Docker, if the image already exists, the pull attempt is fast because all image layers are cached and no image download is needed.
As a workaround you can add step 1 and 2 in to your script:
1- docker images // will show you the list of images including (gcr.io/my-project/my-app:latest)
2- docker rmi --force gcr.io/my-project/my-app:latest // will delete local image
3- docker run (rest of your command, it will download the latest image again from the gcr.io)

Docker Image tagging in ECR

I am pushing docker image in AWS ECR using Jenkins.
While pushing the image I am providing tag as $Build_Number. So in ECR repo I have images with tags like 1,2,3,4.
But when I am trying to pull the image from EC2 with below command from Jenkins job
docker pull 944XXX.dkr.ecr.us-east-1.amazonaws.com/repository1:latest
I am getting an error as there is no image with tag as latest.
Here I want to pull the latest image (with tag 4). I cannot hard-code tag number here as docker pull command will run from Jenkins job automatically. So what way can I pull the latest image?
I believe that the correct approach here would be to push the same image twice with different tags. One push would include the image with no tag and then the second push would be the same image after you have tagged it.
Note that you don't have to build the image twice. You only need to issue the docker push twice.
ECR is "smart" enough to recognise that the image digest did not change and it will not try to actually upload the image twice. On the second push only the tag will be send to ECR.
Now that you have an untagged version and a tagged version, you can pull the image without the tag specification and you will get the :latest image. Here is a reference to the AWS docs where they mention that the :latest tag will be added if no tag was sent by the user.
The flow would look something like this:
# Build the image
docker build -f ./Dockerfile -t my-web-app
# Push the untagged image (will become the ":latest")
docker push my-web-app
# Tag the image with your build_number
docker tag my-web-app my-web-app:build_number
# Push the tagged image
docker push my-web-app:build_number
You will now be able to:
docker pull my-web-app:build_number
docker pull my-web-app
Which will result in 2 identical images with only the tag differentiating between them.
one solution is suggested by #Lix that you can try, or if you are interested with just latest pushed image and no matter whats the tag of a latest image then you can get the latest image from AWS-CLI.
So your Jenkins job command will be
TAG=$(aws ecr describe-images --output json --repository-name stage/redis --query 'sort_by(imageDetails,& imagePushedAt)[-1].imageTags[0]' | jq . --raw-output)
docker pull 944XXX.dkr.ecr.us-east-1.amazonaws.com/repository1:$TAG
aws-cli-ecr-list-images-get-newest
If you want a latest tag on ECR, you need to add it and push it there when you build the image. You can use the -t to docker build multiple times (see https://docs.docker.com/v17.09/engine/reference/commandline/build/); just make sure to push them all.

Moving images from Docker registry to GCR

Google cloud run does not support the docker registry, therefore I have to manually pull the image, tag it and push it to GCR.
Container image URL should match pattern [region.]gcr.io/repo-path[:tag or #digest]
Is there any simpler way to do this?
Sadly, that's the easiest way to move a Docker image from one container registry to another one.
Just for documentation purposes, I will add the steps for the benefit of the community:
Pull the Docker image using the following command:
docker pull [REPOSITORY-NAME]/[IMAGE]:[TAG]
Then, tag that pulled image using the following command:
docker tag [IMAGE] gcr.io/[PROJECT-ID]/[IMAGE]
Push that image to your gcr repository using the following command:
docker push gcr.io/[PROJECT-ID]/[IMAGE]
I'm afraid in any case, "simpler" won't be a thing. Though, you may try to use Docker web hooks to call a simple Cloud function (pull, tag, push) in order to keep your images in sync in your GCR.
There seems to be some projects to manage that kind of hassle like dregsy but I didn't try them...
I've been working on some tooling called regclient that supports this use case. For copying a single image, the command would be:
regctl image copy ${source} ${target}
e.g.
regctl image copy ubuntu:latest gcr.io/your-project/ubuntu:latest
This checks the digests before copying with a HEAD request to allow the command to be run frequently but only using your quota when the upstream image doesn't match what's on GCR. It also copies multi-platform images which you wouldn't get with a docker pull and docker push (docker dereferences the image to your platform on the pull). And unlike the docker pull, the individual layers are only copied when they don't exist on the target registry.
If you have lots of images to continuously mirror, there's also a regsync command that copies according to a yaml file with a list of images, tags, and schedule to run the copies.
These can run as containers, but they are also available as standalone binaries that don't require docker to run.

Need to take image of docker image or container from application installed machine in AWS

As i am working on docker, i need help to take a container or image from existing AWS box. In my AWS box our application is installed and initiated.
For our application initialization, it takes more time. So i want to deploy this container(application installed) while the box launching time itself. When i am taking docker container it will have my application initiated, as per my understanding. So i can save the application initialization time.
I am launching the machine through ansible in AWS VPC. So i can call the docker container there.
Can anyone help on this how to do this activity.
With Thanks,
Ezhilmurugan M I
If you docker commit your changes into an image with a tag, you can then push to a registry, and then pull down the images on another server.
$ docker commit <hash or name> yourusername/red_panda
$ docker push yourusername/red_panda
On other host
$ docker pull yourusername/red_panda
You could also export the image, transfer however you want, and then import the image on the new server.
$ docker export red_panda > latest.tar
$ cat latest.tgz | docker import - exampleimagelocal:new