Kubernetes authentication issues pulling ECR images

Kubernetes authentication issues pulling ECR images - amazon-web-services

Kubernetes docs say using AWS ECR is supported, but it’s not working for me. My nodes have an EC2 instance role associated with all the correct permissions but kubectl run debug1 -i --tty --restart=Never --image=672129611065.dkr.ecr.us-west-2.amazonaws.com/debug:v2 results in failed to "StartContainer" for "debug1" with ErrImagePull: "Authentication is required."
Details
The instances all have a role associated and that role has this policy attached:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:BatchGetImage"
],
"Resource": "*"
}]
}
And the kubelet logs look like:
Apr 18 19:02:12 ip-10-0-170-46 kubelet[948]: I0418 19:02:12.004611 948 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Apr 18 19:02:12 ip-10-0-170-46 kubelet[948]: E0418 19:02:12.112142 948 pod_workers.go:138] Error syncing pod b21c2ba6-0593-11e6-9ec1-065c82331f7b, skipping: failed to "StartContainer" for "debug1" with ErrImagePull: "Authentication is required."
Apr 18 19:02:27 ip-10-0-170-46 kubelet[948]: E0418 19:02:27.006329 948 pod_workers.go:138] Error syncing pod b21c2ba6-0593-11e6-9ec1-065c82331f7b, skipping: failed to "StartContainer" for "debug1" with ImagePullBackOff: "Back-off pulling image \"672129611065.dkr.ecr.us-west-2.amazonaws.com/debug:v2\""

From those logs, I suspect one of three things:
You haven't passed the --cloud-provider=aws arg on the kubelet.
The correct IAM permissions weren't in place when your kubelet started up. If this is the case, a simple bounce of the kubelet daemon should work for you.
You're on a k8s version < 1.2. Although, this one seems unlikely, given the date of your question.

I think you also need the image pull secret configured for ecr images. You can reffer the below links for details.
http://kubernetes.io/docs/user-guide/images/#specifying-imagepullsecrets-on-a-pod
http://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_GetStarted.html
https://github.com/kubernetes/kubernetes/issues/499
1) Retrieve the docker login command that you can use to authenticate your Docker client to your registry:
aws ecr get-login --region us-east-1
2) Run the docker login command that was returned in the previous step.
3) Docker login secret saved /root/.dockercfg
4)Encode docker config file
echo $(cat /root/.dockercfg) | base64 -w 0
5)Copy and paste result to secret YAML based on the old format:
apiVersion: v1
kind: Secret
metadata:
name: aws-key
type: kubernetes.io/dockercfg
data:
.dockercfg: <YOUR_BASE64_JSON_HERE>
6) Use this aws-key secret to access the image
apiVersion: v1
kind: Pod
metadata:
name: foo
namespace: awesomeapps
spec:
containers:
- name: foo
image: janedoe/awesomeapp:v1
imagePullSecrets:
- name: aws-key

Generally if you change permissions on an InstanceProfile they take effect immediately. However, there must be some kind of setup phase for the Kubelet that requires the permissions to already be set. I completely bounced my CloudFormation stack so that the booted with the new permissions active and that did the trick. I can now use ECR images without issue.

Related

AWS IAM Role for Service Account with EBS CSI driver: could not create volume in EC2: NoCredentialProviders: no valid providers in chain

I have successfully installed the AWS EBS CSI driver to my EKS cluster.
This is meant to be using the "IAM Role for Service Account" technique.
I am trying to utilise the checkout example app that AWS have given here
The pod will not come up (pending) and the PVC is showing this:
Name: ebs-claim
Namespace: test
StorageClass: ebs-sc
Status: Pending
Volume:
Labels: app=ebs-claim
com.mylabel.contact=dl-myteam.dlonp1
Annotations: volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: ip-10-232-100-115.ec2.internal
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: meme-ebs
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 27s persistentvolume-controller storageclass.storage.k8s.io "ebs-sc" not found
Normal Provisioning 8s (x4 over 25s) ebs.csi.aws.com_ebs-csi-controller-6dfdb77cdf-fbsbz_1760973c-09bb-43ab-b005-ffcd818447fc External provisioner is provisioning volume for claim "test/ebs-claim"
Warning ProvisioningFailed 5s (x4 over 22s) ebs.csi.aws.com_ebs-csi-controller-6dfdb77cdf-fbsbz_1760973c-09bb-43ab-b005-ffcd818447fc failed to provision volume with StorageClass "ebs-sc": rpc error: code = Internal desc = Could not create volume "pvc-05efbff8-9506-4003-9bab-e1ce4719bc1c": could not create volume in EC2: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, .
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
Similar to an issue I saw here, but had no answers.
Can anyone suggest things to try? Seems like the IAM role is not wired thru to the API that mounts the volume on EC2?

Looks like an issue with the service account that your efs csi driver is using. For example, make sure it's using the right role with the right trust policy for your EKS cluster. For example check the right annotation below:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: aws-efs-csi-driver
name: efs-csi-controller-sa
namespace: kube-system
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/AmazonEKS_EFS_CSI_DriverRole
And the role that you are using has the right trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:kube-system:efs-csi-controller-sa"
}
}
}
]
}
The instructions here are pretty clear. (It worked for me)

Ensure that the created ServiceAccounts have the correct IRSA annotations.
If you are using the helm chart, and doing an upgrade from an older version, double check the location of the IRSA ServiceAccount annotation (they may have changed, had me stumped for a bit as to why things didn't work).

Getting on and off errors on Codedeploy about appspec

I am having a weird issue in CodePipeline + CodeDeploy, we have checked all the aws forums and stackoverflow but no one has had the particular issue and close issues suggestion have been already been taken into account but nothing has helped.
The issue in particular is the following :
We have a CodePipeline:
It happens that "randomly" we get the error:
(x) An AppSpec file is required, but could not be found in the revision
But the required file is in the Revision, we have checked dozens of times, and the files are in there and are the same name and format as the times that follow without problems.
This is happening in the same Deployment Group, with the same configuration, so is not a poorly configured Group because most of the times work without issues.
Just to be sure i add both .yml and .yaml versions in the revision. And the appspec is as simple as this:
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:xxxxxxxx:task-definition/my_app_cd:258"
LoadBalancerInfo:
ContainerName: "nginx_main"
ContainerPort: 80
PlatformVersion: null

The above error I suspect is related to the wrong configuration for your codepipeline. To perform ECS codedeploy deployments, the provider in your codepipeline stage for deployment must be "ECS (blue/green)" not "Codedeploy" ( codedeploy is used for EC2 deployments.
Even though in the back-end it uses codedeploy, the name of the provider is "ECS (blue/green)".
Pipeline configuration can be checked as:
$ aws codepipeline get-pipeline --name <pipeline_name>
{
"name": "Deploy",
"blockers": null,
"actions": [
{
"name": "Deploy",
"actionTypeId": {
"category": "Deploy",
"owner": "AWS",
"provider": "CodeDeploy", <===== should be "CodeDeployToECS"
"version": "1"
},

kube-controller-manager doesn't start when using "cloud-provider=aws" with kubeadm

I'm trying to use Kubernetes integration with AWS, but kube-controller-manager don't start.
(BTW: Everything works perfectly without the ASW option)
Here is what I do:
-- 1 --
ubuntu#ip-172-31-17-233:~$ more /etc/kubernetes/aws.conf
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
cloudProvider: aws
kubernetesVersion: 1.10.3
-- 2 --
ubuntu#ip-172-31-17-233:~$ more /etc/kubernetes/cloud-config.conf
[Global]
KubernetesClusterTag=kubernetes
KubernetesClusterID=kubernetes
(I tried lots of combinations here, according to the examples which I found, including "ws_access_key_id", "aws_secret_access_key", omitting the .conf, or removing this file, but nothing worked)
-- 3 --
ubuntu#ip-172-31-17-233:~$ sudo kubeadm init --config /etc/kubernetes/aws.conf
[init] Using Kubernetes version: v1.10.3
[init] Using Authorization modes: [Node RBAC]
[init] WARNING: For cloudprovider integrations to work --cloud-provider must be set for all kubelets in the cluster.
(/etc/systemd/system/kubelet.service.d/10-kubeadm.conf should be edited for this purpose)
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [ip-172-31-17-233 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.31.17.233]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [ip-172-31-17-233] and IPs [172.31.17.233]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 19.001348 seconds
[uploadconfig]Â Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node ip-172-31-17-233 as master by adding a label and a taint
[markmaster] Master ip-172-31-17-233 tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: x8hi0b.uxjr40j9gysc7lcp
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 172.31.17.233:6443 --token x8hi0b.uxjr40j9gysc7lcp --discovery-token-ca-cert-hash sha256:8ad9dfbcacaeba5bc3242c811b1e83c647e2e88f98b0d783875c2053f7a40f44
-- 4 --
ubuntu#ip-172-31-17-233:~$ mkdir -p $HOME/.kube
ubuntu#ip-172-31-17-233:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: overwrite '/home/ubuntu/.kube/config'? y
ubuntu#ip-172-31-17-233:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
-- 5 --
ubuntu#ip-172-31-17-233:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-ip-172-31-17-233 1/1 Running 0 40s
kube-system kube-apiserver-ip-172-31-17-233 1/1 Running 0 45s
kube-system kube-controller-manager-ip-172-31-17-233 0/1 CrashLoopBackOff 3 1m
kube-system kube-scheduler-ip-172-31-17-233 1/1 Running 0 35s
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Any idea?
I'm new to Kubernetes, and I have no idea what I can do...
Thanks,
Michal.

Any idea?
Check following points as potential issues:
kubelet has proper provider set, check /etc/systemd/system/kubelet.service.d/20-cloud-provider.conf containing:
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=aws --cloud-config=/etc/kubernetes/cloud-config.conf
if not, add and restart kubelet service.
In /etc/kubernetes/manifests/ check following files have proper configuration:
kube-controller-manager.yaml and kube-apiserver.yaml:
--cloud-provider=aws
if not, just add, and pod will be automatically restarted.
Just in case, check that AWS resources (EC2 instances, etc) are tagged with kubernetes tag (taken from your cloud-config.conf) and IAM policies are properly set.
If you could supply logs as requested by Artem in comments that could shed more light on the issue.
Edit
As requested in comment, short overview of IAM policy handling:
create new IAM policy (or edit appropriately if already created), say k8s-default-policy. Given below is quite a liberal policy and you can fine grain exact settings to match you security preferences. Pay attention to load balancer section in your case. In the description put something along the lines of "Allows EC2 instances to call AWS services on your behalf." or similar...
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::kubernetes-*"
]
},
{
"Effect": "Allow",
"Action": "ec2:Describe*",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:AttachVolume",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:DetachVolume",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["ec2:*"],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": ["elasticloadbalancing:*"],
"Resource": ["*"]
} ]
}
create new role (or edit approptiately if already created) and attach previous policy to it, say attach k8s-default-policy to k8s-default-role.
Attach Role to instances that can handle AWS resources. You can create different roles for master and for workers if you need to. EC2 -> Instances -> (select instance) -> Actions -> Instance Settings -> Attach/Replace IAM Role -> (select appropriate role)
Also, apart from this check that all resources in question are tagged with kubernetes tag.

openshift origin - using dynamic ebs volumes

I am running openshift origin 3.6 ( kube v1.6.1+5115d708d7) in AWS. Ansible inventory contains cloud provider configuration and I can see the config files on the master nodes.
# From inventory
# AWS
openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key="{{ lookup('env','AWS_ACCESS_KEY_ID') }}"
openshift_cloudprovider_aws_secret_key="{{ lookup('env','AWS_SECRET_ACCESS_KEY') }}"
I have also provisioned a storageclass
# oc get storageclass
NAME TYPE
fast (default) kubernetes.io/aws-ebs
However, when i try to create a pvc:
kind: "PersistentVolumeClaim"
apiVersion: "v1"
metadata:
name: "testclaim"
namespace: testns
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "3Gi"
storageClassName: fast
It just goes in infinite loop trying to get the pvc created. Events show me this error:
(combined from similar events): Failed to provision volume with StorageClass "fast": UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: $(encoded-message) status code: 403, request id: d0742e84-a2e1-4bfd-b642-c6f1a61ddc1b
Unfortunately I cannot decode the encoded message using aws cli as it gives error.
aws sts decode-authorization-message -–encoded-message $(encoded-message)
Error: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
I haven't tried pv+pvc creation as I am looking for dynamic provisioning. Any guidance as to what I might be doing wrong.
So far I have been able to deploy pods, services etc and they seem to be working fine.

That error appears to be an AWS IAM error:
UnauthorizedOperation
You are not authorized to perform this operation. Check your IAM
policies, and ensure that you are using the correct access keys. For
more information, see Controlling Access. If the returned message is
encoded, you can decode it using the DecodeAuthorizationMessage
action. For more information, see DecodeAuthorizationMessage in the
AWS Security Token Service API Reference.
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#CommonErrors
You'll need to create the appropriate IAM Policies: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ExamplePolicies_EC2.html#iam-example-manage-volumes

IAM Error while using ecs-cli

I'm trying to create a new Task for ECS using a compose file, but i'm getting an AccessDeniedException even when my user has the required permissions.
$ ecs-cli compose --project-name test create
WARN[0000] Skipping unsupported YAML option for service... option name=build service name=builder
WARN[0000] Skipping unsupported YAML option for service... option name=restart service name=db
WARN[0000] Skipping unsupported YAML option for service... option name=restart service name=dbadmin
WARN[0000] Skipping unsupported YAML option for service... option name=restart service name=app
ERRO[0001] Error registering task definition error=AccessDeniedException: User: arn:aws:iam::XXXXXXX:user/foo is not authorized to perform: ecs:RegisterTaskDefinition on resource: *
status code: 400, request id: 41e6b69a-a839-11e6-84b0-e9bc2ec3f81b family=ecscompose-test
ERRO[0001] Create task definition failed error=AccessDeniedException: User: arn:aws:iam::XXXXXXX:user/foo is not authorized to perform: ecs:RegisterTaskDefinition on resource: *
status code: 400, request id: 41e6b69a-a839-11e6-84b0-e9bc2ec3f81b
FATA[0001] AccessDeniedException: User: arn:aws:iam::XXXXXXX:user/foo is not authorized to perform: ecs:RegisterTaskDefinition on resource: *
status code: 400, request id: 41e6b69a-a839-11e6-84b0-e9bc2ec3f81b
The user have this policy attached:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:RegisterTaskDefinition",
"ecs:ListTaskDefinitions",
"ecs:DescribeTaskDefinition"
],
"Resource": [
"*"
]
}
]
}
I also tried attaching the AmazonEC2ContainerServiceFullAccess (that have ecs:*), but didn't work.

Found the problem, the user i was using had a policy to use MFA (MultiFactor Auth), that is not supported by the ecs-cli.

I believe this posting has some answers as to why the above error is happening, thought not a fix.
Trouble deploying docker on AWS with ecs-cli
"From what I understand, ecs-cli has a very limited support of the complete Docker Compose file syntax"
per user Dolan Antenucci
Note the warnings
"WARN[0000] Skipping unsupported YAML option for service..."

ECS does not support a big chunk of the compose settings. However, it should just print warnings and ignore them, which will produce unintended results, but should not be throwing permission issues.
When you see 400 AccessDeniedExceptions that are in the form of "user_arn not authorized to perform service:action on service_resource" it is definitely an IAM issue. However, the IAM policy you listed looks correct. My thinking is that you are somehow not using the correct user credentials, or that the IAM policy is not applied correctly to the user.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Kubernetes authentication issues pulling ECR images - amazon-web-services

Related

AWS IAM Role for Service Account with EBS CSI driver: could not create volume in EC2: NoCredentialProviders: no valid providers in chain

Getting on and off errors on Codedeploy about appspec

kube-controller-manager doesn't start when using "cloud-provider=aws" with kubeadm

openshift origin - using dynamic ebs volumes

IAM Error while using ecs-cli

Categories

Resources