openshift origin - using dynamic ebs volumes - amazon-web-services

I am running openshift origin 3.6 ( kube v1.6.1+5115d708d7) in AWS. Ansible inventory contains cloud provider configuration and I can see the config files on the master nodes.
# From inventory
# AWS
openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key="{{ lookup('env','AWS_ACCESS_KEY_ID') }}"
openshift_cloudprovider_aws_secret_key="{{ lookup('env','AWS_SECRET_ACCESS_KEY') }}"
I have also provisioned a storageclass
# oc get storageclass
NAME TYPE
fast (default) kubernetes.io/aws-ebs
However, when i try to create a pvc:
kind: "PersistentVolumeClaim"
apiVersion: "v1"
metadata:
name: "testclaim"
namespace: testns
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "3Gi"
storageClassName: fast
It just goes in infinite loop trying to get the pvc created. Events show me this error:
(combined from similar events): Failed to provision volume with StorageClass "fast": UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: $(encoded-message) status code: 403, request id: d0742e84-a2e1-4bfd-b642-c6f1a61ddc1b
Unfortunately I cannot decode the encoded message using aws cli as it gives error.
aws sts decode-authorization-message -–encoded-message $(encoded-message)
Error: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
I haven't tried pv+pvc creation as I am looking for dynamic provisioning. Any guidance as to what I might be doing wrong.
So far I have been able to deploy pods, services etc and they seem to be working fine.

That error appears to be an AWS IAM error:
UnauthorizedOperation
You are not authorized to perform this operation. Check your IAM
policies, and ensure that you are using the correct access keys. For
more information, see Controlling Access. If the returned message is
encoded, you can decode it using the DecodeAuthorizationMessage
action. For more information, see DecodeAuthorizationMessage in the
AWS Security Token Service API Reference.
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#CommonErrors
You'll need to create the appropriate IAM Policies: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ExamplePolicies_EC2.html#iam-example-manage-volumes

Related

AWS IAM Role - AccessDenied error in one pod

I have a service account which I am trying to use across multiple pods installed in the same namespace.
One of the pods is created by Airflow KubernetesPodOperator.
The other is created via Helm through Kubernetes deployment.
In the Airflow deployment, I see the IAM role being assigned and DynamoDB tables are created, listed etc however in the second helm chart deployment (or) in a test pod (created as shown here), I keep getting AccessDenied error for CreateTable in DynamoDB.
I can see the AWS Role ARN being assigned to the service account and the service account being applied to the pod and the corresponding token file also being created, but I see AccessDenied exception.
arn:aws:sts::1234567890:assumed-role/MyCustomRole/aws-sdk-java-1636152310195 is not authorized to perform: dynamodb:CreateTable on resource
ServiceAccount
Name: mypipeline-service-account
Namespace: abc-qa-daemons
Labels: app.kubernetes.io/managed-by=Helm
chart=abc-pipeline-main.651
heritage=Helm
release=ab-qa-pipeline
tier=mypipeline
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/MyCustomRole
meta.helm.sh/release-name: ab-qa-pipeline
meta.helm.sh/release-namespace: abc-qa-daemons
Image pull secrets: <none>
Mountable secrets: mypipeline-service-account-token-6gm5b
Tokens: mypipeline-service-account-token-6gm5b
P.S: Both the client code created using KubernetesPodOperator and through Helm chart deployment is same i.e. same docker image. Other attributes like nodeSelector, tolerations etc, volume mounts are also same.
The describe pod output for both of them is similar with just some name and label changes.
The KubernetesPodOperator pod has QoS class as Burstable while the Helm chart ones is BestEffort.
Why do I get AccessDenied in Helm deployment but not in KubernetesPodOperator? How to debug this issue?
Whenever we get an AccessDenied exception, there can be two possible reasons:
You have assigned the wrong role
The assigned role doesn't have necessary permissions
In my case, latter is the issue. The permissions assigned to particular role can be sophisticated i.e. they can be more granular.
For example, in my case, the DynamoDB tables which the role can create/describe is limited to only those that are starting with a specific prefix but not all the DynamoDB tables.
So, it is always advisable to check the IAM role permissions whenever
you get this error.
As stated in the question, be sure to check the service account using the awscli image.
Keep in mind that, there is a credential provider chain used in AWS SDKs which determines the credentials to be used by the application. In most cases, the DefaultAWSCredentialsProviderChain is used and its order is given below. Ensure that the SDK is picking up the intended provider (in our case it is WebIdentityTokenCredentialsProvider)
super(new EnvironmentVariableCredentialsProvider(),
new SystemPropertiesCredentialsProvider(),
new ProfileCredentialsProvider(),
WebIdentityTokenCredentialsProvider.create(),
new EC2ContainerCredentialsProviderWrapper());
Additionally, you might also want to set the AWS SDK classes to DEBUG mode in your logger to see which credentials provider is being picked up and why.
To check if the service account is applied to a pod, describe it and check if the AWS environment variables are set to it like AWS_REGION, AWS_DEFAULT_REGION, AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE.
If not, then check your service account if it has the AWS annotation eks.amazonaws.com/role-arn by describing that service account.

Terraform EKS configmaps is forbidden

I am trying to deploy a Kubernetes cluster on AWS EKS using Terraform, run from a Gitlab CI pipeline. My code currently gets a full cluster up and running, except there is a step in which it tries to add the nodes (which are created separately) into the cluster.
When it tries to do this, this is the error I receive:
│ Error: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create resource "configmaps" in API group "" in the namespace "kube-system"
│
│ with module.mastercluster.kubernetes_config_map.aws_auth[0],
│ on .terraform/modules/mastercluster/aws_auth.tf line 63, in resource "kubernetes_config_map" "aws_auth":
│ 63: resource "kubernetes_config_map" "aws_auth" {
│
Terraform I believe is trying to edit the configmap aws_auth in the kube-system namespace, but for whatever reason, it doesn't have permission to do so?
I have found a different answer from years ago on Stackoverflow, that currently matches with what the documentation has to say about adding a aws_eks_cluster_auth data source and adding this to the kubernetes provider.
My configuration of this currently looks like this:
data "aws_eks_cluster" "mastercluster" {
name = module.mastercluster.cluster_id
}
data "aws_eks_cluster_auth" "mastercluster" {
name = module.mastercluster.cluster_id
}
provider "kubernetes" {
alias = "mastercluster"
host = data.aws_eks_cluster.mastercluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.mastercluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.mastercluster.token
load_config_file = false
}
The weird thing is, this has worked for me before. I have successfully deployed multiple clusters using this method. This configuration is an almost identical copy to another one I had before, only the names of the clusters are different. I am totally lost as to why this can possibly go wrong.
Use semver to lock hashicorp provider versions
That's why is so important to use semver in terraform manifests.
As per Terraform documentation:
Terraform providers manage resources by communicating between Terraform and target APIs. Whenever the target APIs change or add functionality, provider maintainers may update and version the provider.
When multiple users or automation tools run the same Terraform configuration, they should all use the same versions of their required providers.
Use RBAC rules for Kubernetes
There is a Github issue filed about this: v2.0.1: Resources cannot be created. Does kubectl reference to kube config properly? · Issue #1127 · hashicorp/terraform-provider-kubernetes with the same error message as in yours case.
And one of the comments answers:
Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.
$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default
You might be able to compare that list with other users on the cluster:
kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default
$ kubectl auth can-i create configmaps
yes
$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no
And investigate related clusterroles:
$ kube describe clusterrolebinding system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
Role:
Kind: ClusterRole
Name: system:basic-user
Subjects:
Kind Name Namespace
---- ---- ---------
Group system:authenticated
$ kubectl describe clusterrole system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.
Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.
Howto solve RBAC issues
As for the error
Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list
There is great explanation by #m-abramovich:
First, some information for newbies.
In Kubernetes there are:
Account - something like your ID. Example: john
Role - some group in the project permitted to do something. Examples: cluster-admin, it-support, ...
Binding - joining Account to Role. "John in it-support" - is a binding.
Thus, in our message above, we see that our Tiller acts as account "default" registered at namespace "kube-system". Most likely you didn't bind him to a sufficient role.
Now back to the problem.
How do we track it:
check if you have specific account for tiller. Usually it has same name - "tiller":
kubectl [--namespace kube-system] get serviceaccount
create if not:
kubectl [--namespace kube-system] create serviceaccount tiller
check if you have role or clusterrole (cluster role is "better" for newbies - it is cluster-wide unlike namespace-wide role). If this is not a production, you can use highly privileged role "cluster-admin":
kubectl [--namespace kube-system] get clusterrole
you can check role content via:
kubectl [--namespace kube-system] get clusterrole cluster-admin -o yaml
check if account "tiller" in first clause has a binding to clusterrole "cluster-admin" that you deem sufficient:
kubectl [--namespace kube-system] get clusterrolebinding
if it is hard to figure out based on names, you can simply create new:
kubectl [--namespace kube-system] create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
finally, when you have the account, the role and the binding between them, you can check if you really act as this account:
kubectl [--namespace kube-system] get deploy tiller-deploy -o yaml
I suspect that your output will not have settings "serviceAccount" and "serviceAccountName":
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
if yes, than add an account you want tiller to use:
kubectl [--namespace kube-system] patch deploy tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
(if you use PowerShell, then check below for post from #snpdev)
Now you repeat previous check command and see the difference:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: tiller <-- new line
serviceAccountName: tiller <-- new line
terminationGracePeriodSeconds: 30
Resources:
Using RBAC Authorization | Kubernetes
Demystifying RBAC in Kubernetes | Cloud Native Computing Foundation
Helm | Role-based Access Control
Lock and Upgrade Provider Versions | Terraform - HashiCorp Learn

AWS EKS kubectl - No resources found in default namespace

Trying to setup a EKS cluster.
An error occurred (AccessDeniedException) when calling the DescribeCluster operation: Account xxx is not authorized to use this service. This error came form the CLI, on the console I was able to crate the cluster and everything successfully.
I am logged in as the root user (its just my personal account).
It says Account so sounds like its not a user/permissions issue?
Do I have to enable my account for this service? I don't see any such option.
Also if login as a user (rather than root) - will I be able to see everything that was earlier created as root. I have now created a user and assigned admin and eks* permissions. I checked this when I sign in as the user - I can see everything.
The aws cli was setup with root credentials (I think) - so do I have to go back and undo fix all this and just use this user.
Update 1
I redid/restarted everything including user and awscli configure - just to make sure. But still the issue did not get resolved.
There is an option to create the file manually - that finally worked.
And I was able to : kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 443/TCP 48m
KUBECONFIG: I had setup the env:KUBECONFIG
$env:KUBECONFIG="C:\Users\sbaha\.kube\config-EKS-nginixClstr"
$Env:KUBECONFIG
C:\Users\sbaha\.kube\config-EKS-nginixClstr
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* aws kubernetes aws
kubectl config current-context
aws
My understanding is is I should see both the aws and my EKS-nginixClstr contexts but I only see aws - is this (also) a issue?
Next Step is to create and add worker nodes. I updated the node arn correctly in the .yaml file: kubectl apply -f ~\.kube\aws-auth-cm.yaml
configmap/aws-auth configured So this perhaps worked.
But next it fails:
kubectl get nodes No resources found in default namespace.
On AWS Console NodeGrp shows- Create Completed. Also on CLI kubectl get nodes --watch - it does not even return.
So this this has to be debugged next- (it never ends)
aws-auth-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::arn:aws:iam::xxxxx:role/Nginix-NodeGrpClstr-NodeInstanceRole-1SK61JHT0JE4
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
This problem was related to not having the correct version of eksctl - it must be at least 0.7.0. The documentation states this and I knew this, but initially whatever I did could not get beyond 0.6.0. The way you get it is to configure your AWS CLI to a region that supports EKS. Once you get 0.7.0 this issue gets resolved.
Overall to make EKS work - you must have the same user both on console and CLI, and you must work on a region that supports EKS, and have correct eksctl version 0.7.0.

Use GKE beta features with GCP Deployment Manager

I'm trying to create GKE REGION cluster (beta feature) with GCP deployment manager.
But I got error. Is there any way to use GKE beta features (include region cluster) with deployment manager?
ERROR: (gcloud.beta.deployment-manager.deployments.create) Error in
Operation [operation-1525054837836-56b077fdf48e0-a571296c-604523fb]:
errors:
- code: RESOURCE_ERROR
location: /deployments/test-cluster1/resources/source-cluster
message: '{"ResourceType":"container.v1.cluster","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"v1 API cannot be used to access GKE regional clusters. See https://cloud.google.com/kubernetes-engine/docs/reference/api-organization#beta for more information.","status":"INVALID_ARGUMENT","statusMessage":"Bad Request","requestPath":"https://container.googleapis.com/v1/projects/project_id/zones/us-central1/clusters","httpMethod":"POST"}}'
In the error message, link of gcp help.
https://cloud.google.com/kubernetes-engine/docs/reference/api-organization#beta
Configured as described there but error still appears.
My deployment manager yaml file looks like,
resources:
- name: source-cluster
type: container.v1.cluster
properties:
zone: us-central1
cluster:
name: source
initialNodeCount: 3
Yet, zonal cluster is completely work. So I think it's related to usage of container v1beta api in deployment-manager commands.
resources:
- name: source-cluster
type: container.v1.cluster
properties:
zone: us-central1-b
cluster:
name: source
initialNodeCount: 3
Thanks.
The error message you are receiving appears to be related to the fact that you are attempting to use a beta feature but you are specifying a Deployment Manager resource as using API v1 (i.e. container.v1.cluster). This means there's inconstancy between the beta resource you are trying to create and the specified resource.
I've had a look into this and discovered that the ability to add regional clusters via Deployment Manager is a very recent addition to Google Cloud Platform as detailed in this public feature request which has only recently been implemented.
It seems you would need to specify the API type as 'gcp-types/container-v1beta1:projects.locations.clusters' for this to work, and rather than using the 'zone' or 'region' key in the YAML, you would instead use a parent property that includes locations.
So your YAML would look something like this (replace PROJECT_ID with your own).
resources:
- type: gcp-types/container-v1beta1:projects.locations.clusters
name: source-cluster
properties:
parent: projects/PROJECT_ID/locations/us-central1
cluster:
name: source
initialNodeCount: 3

Google cloud deployment manager update Container cluster

I'm trying to create a Google Cloud Deployment Manager configuration to deploy and manage a Google Cloud Container cluster. So far, creating a configuration to create a cluster works, however updating fails. If I change a setting, the execution of the script fails with an error message I can't decipher:
code: RESOURCE_ERROR
location: /deployments/my-first-cluster/resources/my-first-test-cluster-setup
message:
'{"ResourceType":"container.v1.cluster","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Invalid
JSON payload received. Unknown name \"cluster\": Cannot find field.","status":"INVALID_ARGUMENT","details":[{"#type":"type.googleapis.com/google.rpc.BadRequest","fieldViolations":[{"description":"Invalid
JSON payload received. Unknown name \"cluster\": Cannot find field."}]}],"statusMessage":"Bad
Request","requestPath":"https://container.googleapis.com/v1/projects/*****/zones/europe-west1-b/clusters/my-first-cluster"}}'
The relevant configuration:
resources:
- name: my-first-test-cluster-setup
type: container.v1.cluster
properties:
zone: europe-west1-b
cluster:
name: my-first-cluster
description: My first cluster setup
nodePools:
- name: my-cluster-node-pool
config:
machineType: n1-standard-1
initialNodeCount: 1
autoscaling:
enabled: true
minNodeCount: 3
maxNodeCount: 5
management:
autoUpgrade: true
autoRepair: true
It looks like this is a bug in Deployment Manager which means that it is not able to update GKE clusters. The bug is reported here. It has the same strange 'unknown name "cluster"' message that you see.
There is no suggestion on the ticket about workarounds or resolution.
We have seen this same problem when updating a different cluster property.