403 Forbidden on ESPv2, GKE AutoPilot, WIF - google-cloud-platform

I'm following the Getting started with Endpoints for GKE with ESPv2. I'm using Workload Identity Federation and Autopilot on the GKE cluster.
I've been running into the error:
F0110 03:46:24.304229 8 server.go:54] fail to initialize config manager: http call to GET https://servicemanagement.googleapis.com/v1/services/name:bookstore.endpoints.<project>.cloud.goog/rollouts?filter=status=SUCCESS returns not 200 OK: 403 Forbidden
Which ultimately leads to a transport failure error and shut down of the Pod.
My first step was to investigate permission issues, but I could really use some outside perspective on this as I've been going around in circles on this.
Here's my config:
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.oauthScopes)'
['https://www.googleapis.com/auth/devstorage.read_only',
'https://www.googleapis.com/auth/logging.write',
'https://www.googleapis.com/auth/monitoring',
'https://www.googleapis.com/auth/service.management.readonly',
'https://www.googleapis.com/auth/servicecontrol',
'https://www.googleapis.com/auth/trace.append']
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.serviceAccount)'
default
default
Service-Account-Name: test-espv2
Roles
Cloud Trace Agent
Owner
Service Account Token Creator
Service Account User
Service Controller
Workload Identity User
I've associated the WIF svc-act with the Cluster with the following yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: test-espv2#<project>.iam.gserviceaccount.com
name: test-espv2
namespace: eventing
And then I've associated the pod with the test-espv2 svc-act
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: esp-grpc-bookstore
namespace: eventing
spec:
replicas: 1
selector:
matchLabels:
app: esp-grpc-bookstore
template:
metadata:
labels:
app: esp-grpc-bookstore
spec:
serviceAccountName: test-espv2
Since the gcr.io/endpoints-release/endpoints-runtime:2 is limited,
I created a test container and deployed it into the same eventing namespace.
Within the container, I'm able to retrieve the endpoint service config with the following command:
curl --fail -o "service.json" -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://servicemanagement.googleapis.com/v1/services/${SERVICE}/configs/${CONFIG_ID}?view=FULL"
And also within the container, I'm running as the impersonated service account, tested with:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/
Are there any other tests I can run to help me debug this issue?
Thanks in advance,

Around debugging - I've often found my mistakes by following one of the other methods/programming languages in the Google tutorials.
Have you looked at the OpenAPI notes and tried to follow along?

I've finally figured out the issue. It was in 2 parts.
Redeployment of app, paying special attention and verification of the kubectl annotate serviceaccount commands
add-iam-policy-binding for both serviceController and cloudtrace.agent
omitting nodeSelector: iam.gke.io/gke-metadata-server-enabled: "true" due to Autopilot
Doing this enabled a successful kube deployment as displayed by the logs.
Next error I had was
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
This was fixed by turning my attention back to my Kube cluster.
Looking through the events in my ingress service, since I was in a shared-vpc and my security policies only allowed firewall management from the host project, the deployment was failing to update the firewall rules.
Manually provisioning them, as shown here :
https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#manually_provision_firewall_rules_from_the_host_project
solved my issues.

Related

Cluster auto scaler pod crashing timeout sts.us-west-1.amazonaws.com

I am following this document to deploy cluster auto scaler in EKS https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
EKS Version is 1.24. Cluster. Public traffic is allowed on the open internet and we have whitelisted the .amazonaws.com domain in the squid proxy.
I feel there might be something wrong with the role or policy configuration
Error in pod:
F0208 05:39:52.442470 1 aws_cloud_provider.go:386] Failed to
generate AWS EC2 Instance Types: WebIdentityErr: failed to retrieve
credentials caused by: RequestError: send request failed caused by:
Post "https://sts.us-west-1.amazonaws.com/": dial tcp
176.32.112.54:443: i/o timeout
The service account has the annotation in place to make use of the IAM role
Kubectl describes cluster-autoscaler service account
Name: cluster-autoscaler
Namespace: kube-system
Labels: k8s-addon=cluster-autoscaler.addons.k8s.io
k8s-app=cluster-autoscaler
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<ID>:role/irsa-clusterautoscaler
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
It was solved by adding the proxy details on the container env of the deployment. Which is missing in the actual documentation, they could add it as a hint. Pod was not taking the proxy setting available in the node, it was expecting it to be configured.

AWS EKS fargate coredns ImagePullBackOff

I'm trying to deploy a simple tutorial app to a new fargate based kubernetes cluster.
Unfortunately I'm stuck on ImagePullBackOff for the coredns pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 5m51s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 4m11s fargate-scheduler Successfully assigned kube-system/coredns-86cb968586-mcdpj to fargate-ip-172-31-55-205.eu-central-1.compute.internal
Warning Failed 100s kubelet Failed to pull image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": rpc error: code = Unknown desc = failed to pull and unpack image "602
401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": failed to resolve reference "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": failed to do request: Head "https://602401143452.dkr.
ecr.eu-central-1.amazonaws.com/v2/eks/coredns/manifests/v1.8.0-eksbuild.1": dial tcp 3.122.9.124:443: i/o timeout
Warning Failed 100s kubelet Error: ErrImagePull
Normal BackOff 99s kubelet Back-off pulling image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1"
Warning Failed 99s kubelet Error: ImagePullBackOff
Normal Pulling 87s (x2 over 4m10s) kubelet Pulling image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1"
While googling I found https://aws.amazon.com/premiumsupport/knowledge-center/eks-ecr-troubleshooting/
It contains a following list:
To resolve this error, confirm the following:
- The subnet for your worker node has a route to the internet. Check the route table associated with your subnet.
- The security group associated with your worker node allows outbound internet traffic.
- The ingress and egress rule for your network access control lists (ACLs) allows access to the internet.
Since I've created both my private subnets as well as their NAT Gateways manually I tried to locate an issue here but couldn't find anything. They as well as security groups and ACLs look fine to me.
I even added the AmazonEC2ContainerRegistryReadOnly to my EKS role but after issuing command kubectl rollout restart -n kube-system deployment coredns the result is unfortunately the same: ImagePullBackOff
Unfortunately I've runned out of ideas and I'm stuck. Any help that would help me troubleshoot this would be greatly appreciated. ~Thanks
edit>
After creating new cluster via *eksctl as #mreferre suggested in his comment I get RBAC error with link: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting_iam.html#security-iam-troubleshoot-cannot-view-nodes-or-workloads
I'm not sure what is going on since I already have
edit>>
The cluster created via AWS Console ( web interface ) doesn't have the configmap aws-auth I've retrieved the configmap below using command kubectl edit configmap aws-auth -n kube-system
apiVersion: v1
data:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
- system:node-proxier
rolearn: arn:aws:iam::370179080679:role/eksctl-tutorial-cluster-FargatePodExecutionRole-1J605HWNTGS2Q
username: system:node:{{SessionName}}
kind: ConfigMap
metadata:
creationTimestamp: "2021-04-08T18:42:59Z"
name: aws-auth
namespace: kube-system
resourceVersion: "918"
selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth
uid: d9a21964-a8bf-49e9-800f-650320b7444e
Creating an answer to sum up the discussion in the comment that deemed to be acceptable. The most common (and arguably easier) way to setup an EKS cluster with Fargate support is to use EKSCTL and setup the cluster using eksctl create cluster --fargate. This will build all the plumbing for you and you will get a cluster with no EC2 instances nor managed node groups with the two CoreDNS pods deployed on two Fargate instances. Note that when you deploy EKSCTL via the command line you may end up using different roles/users between your CLI and console. This may result in access denied issues. Best course of action would be to use a non-root user to login into the AWS console and use CloudShell to deploy with EKSCTL (CloudShell will inherit the same console user identity). {More info in the comments}

How to connect AWS EKS cluster from Azure Devops pipeline - No user credentials found for cluster in KubeConfig content

I have to setup CI in Microsoft Azure Devops to deploy and manage AWS EKS cluster resources. As a first step, found few kubernetes tasks to make a connection to kubernetes cluster (in my case, it is AWS EKS) but in the task "kubectlapply" task in Azure devops, I can only pass the kube config file or Azure subscription to reach the cluster.
In my case, I have the kube config file but I also need to pass the AWS user credentials that is authorized to access the AWS EKS cluster. But there is no such option in the task when adding the New "k8s end point" to provide the AWS credentials that can be used to access the EKS cluster. Because of that, I am seeing the below error while verifying the connection to EKS cluster.
During runtime, I can pass the AWS credentials via envrionment variables in the pipeline but can not add the kubeconfig file in the task and SAVE it.
Azure and AWS are big players in Cloud and there should be ways to connect to connect AWS resources from any CI platform. Does anyone faced this kind of issues and What is the best approach to connect to AWS first and EKS cluster for deployments in Azure Devops CI.
No user credentials found for cluster in KubeConfig content. Make sure that the credentials exist and try again.
Amazon EKS uses IAM to provide authentication to your Kubernetes cluster through the AWS IAM Authenticator for Kubernetes. You may update your config file referring to the following format:
apiVersion: v1
clusters:
- cluster:
server: ${server}
certificate-authority-data: ${cert}
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: aws-iam-authenticator
env:
- name: "AWS_PROFILE"
value: "dev"
args:
- "token"
- "-i"
- "mycluster"
Useful links:
https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
https://github.com/kubernetes-sigs/aws-iam-authenticator#specifying-credentials--using-aws-profiles
I got the solution by using ServiceAccount following this post: How to deploy to AWS Kubernetes from Azure DevOps
For anyone who is still having this issue, i had to set this up for the startup i worked for and it was pretty simple.
After your cluster is created create the service account
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-robot
EOF
Then apply the cluster rolebinding
$ kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: build-robot
name: build-robot
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: build-robot
namespace: default
EOF
Be careful with the above as it gives full access, checkout (https://kubernetes.io/docs/reference/access-authn-authz/rbac/) for more info for scoping the access.
From there head over to ADO and follow the steps using deploy-robot as the SA name
$ kubectl get serviceAccounts build-robot -n default -o='jsonpath={.secrets[*].name}'
xyz........
$ kubectl get secret xyz........ -n default -o json
...
...
...
Paste the output into the last box when adding the kubernetes resource into the environment and select Accept UnTrusted Certificates. Then click apply and validate and you should be good to go.

kiali showing unkown traffic via sending through ambassador

I have installed service mesh(Istio) and working with Ambassador to route traffic to our application. Whenever I am sending traffic through Istio Ingress its working fine and working with the ambassador but when sending through Ambassador, It is showing unknown, You can see on the attached image, could be related to the fact that the ambassador does not use an Istio sidecar.
Used code to deploy Ambassador service:
apiVersion: v1
kind: Service
metadata:
name: ambassador
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- name: ambassador-http
port: 80
targetPort: 8080
selector:
service: ambassador
---
Is there anything to I can add here to make it possible?
Thanks
Yes, it is possible and here is detailed guide for this from Abmassador documentation:
Getting Ambassador Working With Istio
Getting Ambassador working with Istio is straightforward. In this example, we'll use the bookinfo sample application from Istio.
Install Istio on Kubernetes, following the default instructions (without using mutual TLS auth between sidecars)
Next, install the Bookinfo sample application, following the instructions.
Verify that the sample application is working as expected.
By default, the Bookinfo application uses the Istio ingress. To use Ambassador, we need to:
Install Ambassador.
First you will need to deploy the Ambassador ambassador-admin service to your cluster:
It's simplest to use the YAML files we have online for this (though of course you can download them and use them locally if you prefer!).
First, you need to check if Kubernetes has RBAC enabled:
kubectl cluster-info dump --namespace kube-system | grep authorization-mode
If you see something like --authorization-mode=Node,RBAC in the output, then RBAC is enabled.
If RBAC is enabled, you'll need to use:
kubectl apply -f https://getambassador.io/yaml/ambassador/ambassador-rbac.yaml
Without RBAC, you can use:
kubectl apply -f https://getambassador.io/yaml/ambassador/ambassador-no-rbac.yaml
(Note that if you are planning to use mutual TLS for communication between Ambassador and Istio/services in the future, then the order in which you deploy the ambassador-admin service and the ambassador LoadBalancer service below may need to be swapped)
Next you will deploy an ambassador service that acts as a point of ingress into the cluster via the LoadBalancer type. Create the following YAML and put it in a file called ambassador-service.yaml.
---
apiVersion: getambassador.io/v1
kind: Mapping
metadata:
name: httpbin
spec:
prefix: /httpbin/
service: httpbin.org
host_rewrite: httpbin.org
Then, apply it to the Kubernetes with kubectl:
kubectl apply -f ambassador-service.yaml
The YAML above does several things:
It creates a Kubernetes service for Ambassador, of type LoadBalancer. Note that if you're not deploying in an environment where LoadBalancer is a supported type (i.e. MiniKube), you'll need to change this to a different type of service, e.g., NodePort.
It creates a test route that will route traffic from /httpbin/ to the public httpbin.org HTTP Request and Response service (which provides useful endpoint that can be used for diagnostic purposes). In Ambassador, Kubernetes annotations (as shown above) are used for configuration. More commonly, you'll want to configure routes as part of your service deployment process, as shown in this more advanced example.
You can see if the two Ambassador services are running correctly (and also obtain the LoadBalancer IP address when this is assigned after a few minutes) by executing the following commands:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador LoadBalancer 10.63.247.1 35.224.41.XX 8080:32171/TCP 11m
ambassador-admin NodePort 10.63.250.17 <none> 8877:32107/TCP 12m
details ClusterIP 10.63.241.224 <none> 9080/TCP 16m
kubernetes ClusterIP 10.63.240.1 <none> 443/TCP 24m
productpage ClusterIP 10.63.248.184 <none> 9080/TCP 16m
ratings ClusterIP 10.63.255.72 <none> 9080/TCP 16m
reviews ClusterIP 10.63.252.192 <none> 9080/TCP 16m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ambassador-2680035017-092rk 2/2 Running 0 13m
ambassador-2680035017-9mr97 2/2 Running 0 13m
ambassador-2680035017-thcpr 2/2 Running 0 13m
details-v1-3842766915-3bjwx 2/2 Running 0 17m
productpage-v1-449428215-dwf44 2/2 Running 0 16m
ratings-v1-555398331-80zts 2/2 Running 0 17m
reviews-v1-217127373-s3d91 2/2 Running 0 17m
reviews-v2-2104781143-2nxqf 2/2 Running 0 16m
reviews-v3-3240307257-xl1l6 2/2 Running 0 16m
Above we see that external IP assigned to our LoadBalancer is 35.224.41.XX (XX is used to mask the actual value), and that all ambassador pods are running (Ambassador relies on Kubernetes to provide high availability, and so there should be two small pods running on each node within the cluster).
You can test if Ambassador has been installed correctly by using the test route to httpbin.org to get the external cluster Origin IP from which the request was made:
$ curl 35.224.41.XX/httpbin/ip
{
"origin": "35.192.109.XX"
}
If you're seeing a similar response, then everything is working great!
(Bonus: If you want to use a little bit of awk magic to export the LoadBalancer IP to a variable AMBASSADOR_IP, then you can type export AMBASSADOR_IP=$(kubectl get services ambassador | tail -1 | awk '{ print $4 }')and usecurl $AMBASSADOR_IP/httpbin/ip
Now you are going to modify the bookinfo demo bookinfo.yaml manifest to include the necessary Ambassador annotations. See below.
---
apiVersion: getambassador.io/v1
kind: Mapping
metadata:
name: productpage
spec:
prefix: /productpage/
rewrite: /productpage
service: productpage:9080
---
apiVersion: v1
kind: Service
metadata:
name: productpage
labels:
app: productpage
spec:
ports:
- port: 9080
name: http
selector:
app: productpage
The annotation above implements an Ambassador mapping from the '/productpage/' URI to the Kubernetes productpage service running on port 9080 ('productpage:9080'). The 'prefix' mapping URI is taken from the context of the root of your Ambassador service that is acting as the ingress point (exposed externally via port 80 because it is a LoadBalancer) e.g. '35.224.41.XX/productpage/'.
You can now apply this manifest from the root of the Istio GitHub repo on your local file system (taking care to wrap the apply with istioctl kube-inject):
kubectl apply -f <(istioctl kube-inject -f samples/bookinfo/kube/bookinfo.yaml)
Optionally, delete the Ingress controller from the bookinfo.yaml manifest by typing kubectl delete ingress gateway.
Test Ambassador by going to the IP of the Ambassador LoadBalancer you configured above e.g. 35.192.109.XX/productpage/. You can see the actual IP address again for Ambassador by typing kubectl get services ambassador.
Also according to documentation there is no need for Ambassador pods to be injected.
Yes, I have already configured all these things. That's why I have mentioned it in the attached image. I have taken this from kiali dashboard. That output I have shared of the bookinfo application. I have deployed my own application and its also working fine.
But I want short out this unknown thing.
I am using the AWS EKS cluster.
Putting note about ambassador:
Ambassador should not have the Istio sidecar for two reasons. First, it cannot since running the two separate Envoy instances will result in a conflict over their shared memory segment. The second is Ambassador should not be in your mesh anyway. The mesh is great for handling traffic routing from service to service, but since Ambassador is your ingress point, it should be solely in charge of deciding which service to route to and how to do it. Having both Ambassador and Istio try to set routing rules would be a headache and wouldn't make much sense.
All the traffic coming from a source that is not part of the service mesh is going to be shown as unknown.
See what kiali says about the unknowns:
https://kiali.io/faq/graph/#many-unknown

Kubernetes dashboard in aws EC2 instance?

I have started 2 ubuntu 16 EC2 instance(one for master and other for worker). Everything working OK.
I need to setup dashboard to view on my machine. I have copied admin.ctl and executed the script in my machine's terminal
kubectl --kubeconfig ./admin.conf proxy --address='0.0.0.0' --port=8002 --accept-hosts='.*'
Everything is fine.
But in browser when I use below link
http://localhost:8002/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
I am getting Error: 'dial tcp 192.168.1.23:8443: i/o timeout'
Trying to reach: 'https://192.168.1.23:8443/'
I have enabled all traffics in security policy for aws. What am I missing? Please point me a solution
If you only want to reach the dashboard then it is pretty easy, get the IP address of your EC2 instance and the Port on which it is serving dashboard (kubectl get services --all-namespaces) and then reach it using:
First:
kubectl proxy --address 0.0.0.0 --accept-hosts '.*'
And in your browswer:
http://<IP>:<PORT>/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
Note that this is a possible security vulnerability as you are accepting all traffic (AWS firewall rules) and also all connections for your kubectl proxy (--address 0.0.0.0 --accept-hosts '.*') so please narrow it down or use different approach. If you have more questions feel free to ask.
Have you tried putting http:// in front of localhost?
I don't have enough rep to comment, else I would.
For bypassing dashboard with token. You have to execute the below code
cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
EOF
After this you can skip without providing token. But this will cause security issues.