AccessDenied error when pushing docker image from SageMaker to ECR - amazon-iam

I've created a docker image using AWS SageMaker and am now trying to push said image to ECR. When I do docker push ${fullname} it retries a couple of times and then errors.
In CloudTrail I can see that I'm getting an access denied error with message:
"User: arn:aws:sts::xxxxxxxxxx:assumed-role/AmazonSageMaker-ExecutionRole-xxxxxxxxxxxx/SageMaker is not authorized to perform: ecr:InitiateLayerUpload on resource: arn:aws:ecr:us-east-x:xxxxxxxxxx:repository/image because no identity-based policy allows the ecr:InitiateLayerUpload action"
I have full permissions, but from the error message above it thinks the user is SageMaker and not me.
How do I change the user? I'm guessing that's the problem.

When you're running commands from SageMaker, you're executing them as the SageMaker execution role, instead of your role. There are two options -
[Straighforward solution] Add ecr:InitiateLayerUpload permissions to the AmazonSageMaker-ExecutionRole-xxxxxxxxxxxx role
Assume a different role using sts (in that case, AmazonSageMaker-ExecutionRole-xxxxxxxxxxxx needs to have permissions to assume your Admin role) and then run docker push command.

Related

Why some of the public SageMaker ECR images give ecr:BatchGetImage permission errors?

I have tried some of the Docker registries for Sagemaker (for region eu-north-1 and region us-east-1) and I can pull from some of those registries not from other. I don't understand why.
As far as I know all those ECR registries are "public" and the IAM policy for my user allows everything.
But I'm getting the following error on some of those "public" registries:
User: arn:aws:iam::xxxx:user/ecerulm-iam is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:eu-north-1:669576153137:repository/blazingtext because no resource-based policy allows the ecr:BatchGetImage action
Here is some examples:
# image_uris.retrieve(framework='sklearn',region='eu-north-1',version='0.23-1',image_scope='inference') -> '662702820516.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3'
docker pull 662702820516.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3 # works
# image_uris.retrieve(framework='blazingtext',region='eu-north-1')
-> '669576153137.dkr.ecr.eu-north-1.amazonaws.com/blazingtext:1'
docker pull '669576153137.dkr.ecr.eu-north-1.amazonaws.com/blazingtext:1'
Error response from daemon: pull access denied for 669576153137.dkr.ecr.eu-north-1.amazonaws.com/blazingtext, repository does not exist or may require 'docker login': denied: User: arn:aws:iam::xxxx:user/ecerulm-iam is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:eu-north-1:669576153137:repository/blazingtext because no resource-based policy allows the ecr:BatchGetImage action
This same error when I try to pull these images (obtained with sagemaker.image_uris.retrieve()
669576153137.dkr.ecr.eu-north-1.amazonaws.com/blazingtext:1
763603941244.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-clarify-processing:1.0
811284229777.dkr.ecr.us-east-1.amazonaws.com/blazingtext:1
205585389593.dkr.ecr.us-east-1.amazonaws.com/sagemaker-clarify-processing:1.0
382416733822.dkr.ecr.us-east-1.amazonaws.com/factorization-machines:1
382416733822.dkr.ecr.us-east-1.amazonaws.com/ipinsights:1
but for these it works OK:
662702820516.dkr.ecr.eu-north-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3
683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3
763104351884.dkr.ecr.us-east-1.amazonaws.com/autogluon-inference:0.4-cpu-py38
520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-chainer:5.0.0-cpu-py3
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-tensorflow-training:2.6.3-transformers4.17.0-gpu-py38-cu112-ubuntu20.04
What is causing this? What is the difference between those docker images? Aren't those "public" as I though they were? I just wanted to run the images locally to debug / troubleshoot.

Adding AWS GameLift policies for uploading new builds

I am trying to upload a new AWS GameLift Linux server using the AWS CLI but I get the following error:
An error occurred (AccessDeniedException) when calling the CreateBuild operation: User: arn:aws:iam::------:user/----- is not authorized to perform: gamelift:CreateBuild because no identity-based policy allows the gamelift:CreateBuild action
I added the arn:aws:iam::aws:policy/GameLiftGameServerGroupPolicy to my group permissions. I can see in the policy json that there isn't a CreateBuild action. It either needs to be added or you can't do it this way.
The AWS documentation is useless and on this page: https://docs.aws.amazon.com/gamelift/latest/developerguide/security_iam_troubleshoot.html#security_iam_troubleshoot-no-permissions
it helpfully advises: ... asks his administrator to update his policies
My user is the main root user for my AWS account but I have no idea how to resolve this. Any ideas?
I worked out how to create a new Policy and add the service permissions. You click on 'create policy' and then choose the 'GameLift' service. I added all the available actions. Seemed to do the trick.
Why did AWS miss this out of the documentation?

what iam policies are requried to run ecr commands on ec2 instance that has assumed a role?

I have a small jenkins instance that uses terraform to deploy some stuff such as ECR
When trying to apply changes i get this error
error creating ECR Public repository: AccessDeniedException:
User: arn:aws:sts::1234567890:assumed-role/jenkins_role/i-1234567890 is not authorized to perform: ecr-public:CreateRepository on resource: arn:aws:ecr-public::1234567890:repository/test-repo
I would have thought AmazonEC2ContainerRegistryFullAccess would be enough but that was not the case. When I added AdministratorAccess, it worked. So why is that the case?
AmazonEC2ContainerRegistryFullAccess applies only to private ECR. You are trying to use ecr-public. This means you have to create your own policy which allows ecr-public:CreateRepository (not ecr:CreateRepository).

Insufficient access AWS whilst using AWS CLI

I've been trying to access a project in AWS devicefarm using AWS CLI.
Steps taken:
Downloaded the AWS CLI tool
Configured my credentials according to: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html using aws configure command
executed aws devicefarm list-uploads --arn myProjectArn
and what i get is this error:
An error occurred (AccessDeniedException) when calling the ListUploads operation:
User: arn:aws:iam::replacingANumber:user/myUserName is not authorized to perform: devicefarm:ListUploads
on resource:
arn:aws:devicefarm:us-west-2:replacingANumber:project:replacingALongString with an explicit deny
The docs:https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting_iam.html say i'm missing permissions, but devOps team in my company says i have all the permissions.
What am I missing?
Either misconfigured AWS CLI or insufficient permissions.
This can be 2 things:
Your AWS CLI is misconfigured. Make sure that when you run aws sts get-caller-identity, you get the same role as the one that the devops team claims to have the correct permission. Also, make sure that your default region is us-west-2.
If the above is correctly setup, then it comes from the permissions defined in the IAM policy. If you are able to view the policy associated with your user/role, you can check out the policy simulator to figure out which permission is missing.

CodeDeploy onpremise registration failing with AccessDeniedException on Amazon Lightsail

aws deploy register-on-premises-instance --instance-name XXXXX --iam-user-arn arn:aws:iam::XXXXXXXXXXXX:user/LightSailCodeDeployUser --region ap-south-1
An error occurred (AccessDeniedException) when calling the RegisterOnPremisesInstance operation: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/AmazonLightsailInstanceRole/i-XXXXXXXXXXXXXX is not authorized to perform: codedeploy:RegisterOnPremisesInstance on resource: arn:aws:codedeploy:ap-south-1:XXXXXXXXXX:instance:XXXXXXXXXXXX
I didn't even create the role AmazonLightsailInstanceRole, then how did it come in the picture. My user have all permissions on codedeploy though. I am following this link to set up. https://aws.amazon.com/blogs/compute/using-aws-codedeploy-and-aws-codepipeline-to-deploy-applications-to-amazon-lightsail/
I made the same mistake and then realized that command is meant to be run on your local machine and not the instance!
AmazonLightsailInstanceRole is a service-linked role automatically created by aws:
Service-linked roles are predefined by the service and include all the permissions that the service requires to call other AWS services on your behalf.
The error you are getting is not about you not having the codedeploy:RegisterOnPremisesInstance permission.
The error is about the AmazonLightsailInstanceRole not having it. It does not matter if you (i.e. your IAM user) has all CodeDeploy permissions.
Normally you would add the missing permissions to the role. How to work with the AmazonLightsailInstanceRole is described in the following AWS documentaiton:
Using Service-Linked Roles for Amazon Lightsail
Editing a Service-Linked Role
However, I'm not sure if you can modify the AmazonLightsailInstanceRole and add the missing permissions. Some service-linked roles can be modified, some not.
The documentation is a bit confusing. Create a new user in IAM with admin role (full privileges) and use the credentials of that user to run the command in your local machine.