How to integrate Custom CA (AWS PCA) using Kubernetes CSR in Istio - amazon-web-services

am trying to setup Custom CA (AWS PCA) Integration using Kubernetes CSR in Istio following this doc (Istio / Custom CA Integration using Kubernetes CSR). Steps followed:
i) Enable feature gate on cert-manager controller: --feature-gates=ExperimentalCertificateSigningRequestControllers=true
ii) AWS PCA and aws-privateca-issuer plugin is already in place.
iii) awspcaclusterissuers object in place with AWS PCA arn (arn:aws:acm-pca:us-west-2:<account_id>:certificate-authority/)
iv) Modified Istio operator with defaultConfig and caCertificates of AWS PCA issuer (awspcaclusterissuers.awspca.cert-manager.io/)
v) Modified istiod deployment and added env vars (as mentioned in the doc along with cluster role).
istiod pod is failing with this error:
Generating K8S-signed cert for [istiod.istio-system.svc istiod-remote.istio-system.svc istio-pilot.istio-system.svc] using signer awspcaclusterissuers.awspca.cert-manager.io/cert-manager-aws-root-ca
2023-01-04T07:25:26.942944Z error failed to create discovery service: failed generating key and cert by kubernetes: no certificate returned for the CSR: "csr-workload-lg6kct8nh6r9vx4ld4"
Error: failed to create discovery service: failed generating key and cert by kubernetes: no certificate returned for the CSR: "csr-workload-lg6kct8nh6r9vx4ld4"
K8s Version: 1.22
Istio Version: 1.13.5
Note: Our integration of cert manager and AWS PCA works fine as we generate Private Certificates using cert-manager and PCA with ‘Certificates’ object. It’s the integration of kubernetes csr method with istio that is failing!
Would really appreciate if anybody with knowledge on this could help me out here as there are nearly zero docs on this integration.

I haven't done this with Kubernetes CSR, but I have done it with Istio CSR. Here are the steps to accomplish it with this approach.
Create a certificate in AWS Private CA and download the public root certificate either via the console or AWS CLI (aws acm-pca get-certificate-authority-certificate --certificate-authority-arn <certificate-authority-arn> --region af-south-1 --output text > ca.pem).
Create a secret to store this root certificate. Cert manager will use this public root cert to communicate with the root CA (AWS PCA).
Install cert-manager. Cert manager will essentially function as the intermediate CA in place of istiod.
Install AWS PCA Issuer plugin.
Make sure you have the necessary permissions in place for the workload to communicate with AWS Private CA. The recommended approach would be to use OIDC with IRSA. The other approach is to grant permissions to the node role. The problem with this is that any pod running on your nodes essentially has access to AWS Private CA, which isn't a least privilege approach.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "awspcaissuer",
"Action": [
"acm-pca:DescribeCertificateAuthority",
"acm-pca:GetCertificate",
"acm-pca:IssueCertificate"
],
"Effect": "Allow",
"Resource": "arn:aws:acm-pca:<region>:<account_id>:certificate-authority/<resource_id>"
}
]
}
Once the permissions are in place, you can create a cluster issuer or an issuer that will represent the root CA in the cluster.
apiVersion: awspca.cert-manager.io/v1beta1
kind: AWSPCAClusterIssuer
metadata:
name: aws-pca-root-ca
spec:
arn: <aws-pca-arn-goes-here>
region: <region-where-ca-was-created-in-aws>
Create the istio-system namespace
Install Istio CSR and update the Helm values for the issuer so that cert-manager knows to communicate with the AWS PCA issuer.
helm install -n cert-manager cert-manager-istio-csr jetstack/cert-manager-istio-csr \
--set "app.certmanager.issuer.group=awspca.cert-manager.io" \
--set "app.certmanager.issuer.kind=AWSPCAClusterIssuer" \
--set "app.certmanager.issuer.name=aws-pca-root-ca" \
--set "app.certmanager.preserveCertificateRequests=true" \
--set "app.server.maxCertificateDuration=48h" \
--set "app.tls.certificateDuration=24h" \
--set "app.tls.istiodCertificateDuration=24h" \
--set "app.tls.rootCAFile=/var/run/secrets/istio-csr/ca.pem" \
--set "volumeMounts[0].name=root-ca" \
--set "volumeMounts[0].mountPath=/var/run/secrets/istio-csr" \
--set "volumes[0].name=root-ca" \
--set "volumes[0].secret.secretName=istio-root-ca"
I would also recommend setting preserveCertificateRequests to true at least for the first time you set this up so that you can actually see the CSRs and whether or not the certificate were successfully issued.
When you install Istio CSR, this will create a certificate called istiod as well as a corresponding secret that stores the cert. The secret is called istiod-tls. This is the cert for the intermediate CA (Cert manager with Istio CSR).
9) Install Istio with the following custom configurations:
Update the CA address to Istio CSR (the new intermediate CA)
Disable istiod from functioning as the CA
Mount istiod with with the cert-manager certificate details
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istio
namespace: istio-system
spec:
profile: "demo"
hub: gcr.io/istio-release
values:
global:
# Change certificate provider to cert-manager istio agent for istio agent
caAddress: cert-manager-istio-csr.cert-manager.svc:443
components:
pilot:
k8s:
env:
# Disable istiod CA Sever functionality
- name: ENABLE_CA_SERVER
value: "false"
overlays:
- apiVersion: apps/v1
kind: Deployment
name: istiod
patches:
# Mount istiod serving and webhook certificate from Secret mount
- path: spec.template.spec.containers.[name:discovery].args[7]
value: "--tlsCertFile=/etc/cert-manager/tls/tls.crt"
- path: spec.template.spec.containers.[name:discovery].args[8]
value: "--tlsKeyFile=/etc/cert-manager/tls/tls.key"
- path: spec.template.spec.containers.[name:discovery].args[9]
value: "--caCertFile=/etc/cert-manager/ca/root-cert.pem"
- path: spec.template.spec.containers.[name:discovery].volumeMounts[6]
value:
name: cert-manager
mountPath: "/etc/cert-manager/tls"
readOnly: true
- path: spec.template.spec.containers.[name:discovery].volumeMounts[7]
value:
name: ca-root-cert
mountPath: "/etc/cert-manager/ca"
readOnly: true
- path: spec.template.spec.volumes[6]
value:
name: cert-manager
secret:
secretName: istiod-tls
- path: spec.template.spec.volumes[7]
value:
name: ca-root-cert
configMap:
defaultMode: 420
name: istio-ca-root-cert
If you want to watch a detailed walk-through on how the different components communicate, you can watch this video:
https://youtu.be/jWOfRR4DK8k
In the video, I also show the CSRs and the certs being successfully issued, as well as test that mTLS is working as expected.
The video is long, but you can skip to 17:08 to verify that the solution works.
Here's a repo with these same steps, the relevant manfiest files and architecture diagrams describing the flow: https://github.com/LukeMwila/how-to-setup-external-ca-in-istio

Related

Cant deploy Ingress object on EKS: failed calling webhook vingress.elbv2.k8s.aws: the server could not find the requested resource

I am following this AWS guide: https://aws.amazon.com/premiumsupport/knowledge-center/eks-alb-ingress-controller-fargate/ to setup my kubernetes cluster under ALB.
After installing the AWS ALB controller on my EKS cluster, following below steps:
helm repo add eks https://aws.github.io/eks-charts
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
--set clusterName=YOUR_CLUSTER_NAME \
--set serviceAccount.create=false \
--set region=YOUR_REGION_CODE \
--set vpcId=<VPC_ID> \
--set serviceAccount.name=aws-load-balancer-controller \
-n kube-system
I want to deploy my ingress configurations:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/success-codes: 200,302
alb.ingress.kubernetes.io/target-type: instance
kubernetes.io/ingress.class: alb
name: staging-ingress
namespace: staging
finalizers:
- ingress.k8s.aws/resources
spec:
rules:
- http:
paths:
- backend:
serviceName: my-service
servicePort: 80
path: /api/v1/price
Everything looks fine. However, when I run the below command to deploy my ingress:
kubectl apply -f ingress.staging.yaml -n staging
I am having below error:
Error from server (InternalError): error when creating "ingress.staging.yaml": Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": the server could not find the requested resource
There are very few similar issues on Google an none was helping me. Any ideas of what is the problem?
K8s version: 1.18
the security group solved me:
node_security_group_additional_rules = {
ingress_allow_access_from_control_plane = {
type = "ingress"
protocol = "tcp"
from_port = 9443
to_port = 9443
source_cluster_security_group = true
description = "Allow access from control plane to webhook port of AWS load balancer controller"
}
}
I would suggest to take a look at the alb controller logs, the CRDs that you are using are for v1beta1 API group while the latest chart is registering v1 API group webhook aws-load-balancer-controller v2.4.0
If you look at the alb controller startup logs you should see a line similar to the below message
v1beta1
{"level":"info","ts":164178.5920634,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-networking-v1beta1-ingress"}
v1
{"level":"info","ts":164683.0114837,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-networking-v1-ingress"}
if that is the case you can fix the problem by using an earlier version of the controller or get the newer version for the CRDs

Kubectl commands to EKS, from EC2 in a private networks, are timing out

This EKS cluster has a private endpoint only. My end goal is to deploy Helm charts on the EKS. I connect to an EC2 machine via SSM and I have already installed Helm and Kubectl on that machine. The trouble is that in a private network, the AWS APIs can't be called. So, instead of calling aws eks update-kubeconfig --region region-code --name cluster-name I have created the kubeconfig such as below.
apiVersion: v1
clusters:
- cluster:
server: 1111111111111111.gr7.eu-west-1.eks.amazonaws.com
certificate-authority-data: JTiBDRVJU111111111
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: aws
args:
- "eks"
- "get-token"
- "--cluster-name"
- "this-is-my-cluster"
# - "--role-arn"
# - "role-arn"
# env:
# - name: AWS_PROFILE
# value: "aws-profile"
Getting the following error:
I0127 21:24:26.336266 3849 loader.go:372] Config loaded from file: /tmp/.kube/config-eks-demo
I0127 21:24:26.337081 3849 round_trippers.go:435] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.21.2 (linux/amd64) kubernetes/d2965f0" 'http://1111111111111111.gr7.eu-west-1.eks.amazonaws.com/api?timeout=32s'
I0127 21:24:56.338147 3849 round_trippers.go:454] GET http://1111111111111111.gr7.eu-west-1.eks.amazonaws.com/api?timeout=32s in 30001 milliseconds
I0127 21:24:56.338171 3849 round_trippers.go:460] Response Headers:
I0127 21:24:56.338238 3849 cached_discovery.go:121] skipped caching discovery info due to Get "http://1111111111111111.gr7.eu-west-1.eks.amazonaws.com/api?timeout=32s": dial tcp 10.1.1.193:80: i/o timeout
There is connectivity in the VPC, there are no issues with NACLs, security groups, port 80.
That looks like this open EKS issue: https://github.com/aws/containers-roadmap/issues/298
If that’s the case, upvote it so that the product team can prioritize it. If you have Enterprise support your TAM can help there as well.

403 Forbidden on ESPv2, GKE AutoPilot, WIF

I'm following the Getting started with Endpoints for GKE with ESPv2. I'm using Workload Identity Federation and Autopilot on the GKE cluster.
I've been running into the error:
F0110 03:46:24.304229 8 server.go:54] fail to initialize config manager: http call to GET https://servicemanagement.googleapis.com/v1/services/name:bookstore.endpoints.<project>.cloud.goog/rollouts?filter=status=SUCCESS returns not 200 OK: 403 Forbidden
Which ultimately leads to a transport failure error and shut down of the Pod.
My first step was to investigate permission issues, but I could really use some outside perspective on this as I've been going around in circles on this.
Here's my config:
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.oauthScopes)'
['https://www.googleapis.com/auth/devstorage.read_only',
'https://www.googleapis.com/auth/logging.write',
'https://www.googleapis.com/auth/monitoring',
'https://www.googleapis.com/auth/service.management.readonly',
'https://www.googleapis.com/auth/servicecontrol',
'https://www.googleapis.com/auth/trace.append']
>> gcloud container clusters describe $GKE_CLUSTER_NAME \
--zone=$GKE_CLUSTER_ZONE \
--format='value[delimiter="\n"](nodePools[].config.serviceAccount)'
default
default
Service-Account-Name: test-espv2
Roles
Cloud Trace Agent
Owner
Service Account Token Creator
Service Account User
Service Controller
Workload Identity User
I've associated the WIF svc-act with the Cluster with the following yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: test-espv2#<project>.iam.gserviceaccount.com
name: test-espv2
namespace: eventing
And then I've associated the pod with the test-espv2 svc-act
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: esp-grpc-bookstore
namespace: eventing
spec:
replicas: 1
selector:
matchLabels:
app: esp-grpc-bookstore
template:
metadata:
labels:
app: esp-grpc-bookstore
spec:
serviceAccountName: test-espv2
Since the gcr.io/endpoints-release/endpoints-runtime:2 is limited,
I created a test container and deployed it into the same eventing namespace.
Within the container, I'm able to retrieve the endpoint service config with the following command:
curl --fail -o "service.json" -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://servicemanagement.googleapis.com/v1/services/${SERVICE}/configs/${CONFIG_ID}?view=FULL"
And also within the container, I'm running as the impersonated service account, tested with:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/
Are there any other tests I can run to help me debug this issue?
Thanks in advance,
Around debugging - I've often found my mistakes by following one of the other methods/programming languages in the Google tutorials.
Have you looked at the OpenAPI notes and tried to follow along?
I've finally figured out the issue. It was in 2 parts.
Redeployment of app, paying special attention and verification of the kubectl annotate serviceaccount commands
add-iam-policy-binding for both serviceController and cloudtrace.agent
omitting nodeSelector: iam.gke.io/gke-metadata-server-enabled: "true" due to Autopilot
Doing this enabled a successful kube deployment as displayed by the logs.
Next error I had was
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
This was fixed by turning my attention back to my Kube cluster.
Looking through the events in my ingress service, since I was in a shared-vpc and my security policies only allowed firewall management from the host project, the deployment was failing to update the firewall rules.
Manually provisioning them, as shown here :
https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#manually_provision_firewall_rules_from_the_host_project
solved my issues.

How to connect AWS EKS cluster from Azure Devops pipeline - No user credentials found for cluster in KubeConfig content

I have to setup CI in Microsoft Azure Devops to deploy and manage AWS EKS cluster resources. As a first step, found few kubernetes tasks to make a connection to kubernetes cluster (in my case, it is AWS EKS) but in the task "kubectlapply" task in Azure devops, I can only pass the kube config file or Azure subscription to reach the cluster.
In my case, I have the kube config file but I also need to pass the AWS user credentials that is authorized to access the AWS EKS cluster. But there is no such option in the task when adding the New "k8s end point" to provide the AWS credentials that can be used to access the EKS cluster. Because of that, I am seeing the below error while verifying the connection to EKS cluster.
During runtime, I can pass the AWS credentials via envrionment variables in the pipeline but can not add the kubeconfig file in the task and SAVE it.
Azure and AWS are big players in Cloud and there should be ways to connect to connect AWS resources from any CI platform. Does anyone faced this kind of issues and What is the best approach to connect to AWS first and EKS cluster for deployments in Azure Devops CI.
No user credentials found for cluster in KubeConfig content. Make sure that the credentials exist and try again.
Amazon EKS uses IAM to provide authentication to your Kubernetes cluster through the AWS IAM Authenticator for Kubernetes. You may update your config file referring to the following format:
apiVersion: v1
clusters:
- cluster:
server: ${server}
certificate-authority-data: ${cert}
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: aws-iam-authenticator
env:
- name: "AWS_PROFILE"
value: "dev"
args:
- "token"
- "-i"
- "mycluster"
Useful links:
https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
https://github.com/kubernetes-sigs/aws-iam-authenticator#specifying-credentials--using-aws-profiles
I got the solution by using ServiceAccount following this post: How to deploy to AWS Kubernetes from Azure DevOps
For anyone who is still having this issue, i had to set this up for the startup i worked for and it was pretty simple.
After your cluster is created create the service account
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-robot
EOF
Then apply the cluster rolebinding
$ kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: build-robot
name: build-robot
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: build-robot
namespace: default
EOF
Be careful with the above as it gives full access, checkout (https://kubernetes.io/docs/reference/access-authn-authz/rbac/) for more info for scoping the access.
From there head over to ADO and follow the steps using deploy-robot as the SA name
$ kubectl get serviceAccounts build-robot -n default -o='jsonpath={.secrets[*].name}'
xyz........
$ kubectl get secret xyz........ -n default -o json
...
...
...
Paste the output into the last box when adding the kubernetes resource into the environment and select Accept UnTrusted Certificates. Then click apply and validate and you should be good to go.

How to add insecure Docker registry certificate to kubeadm config

I'm quite new to Kubernetes, and I managed to get an Angular app deployed locally using minikube. But now I'm working on a Bitnami Kubernetes Sandbox EC2 instance, and I've run into issues pulling from my docker registry on another EC2 instance.
Whenever I attempt to apply the deployment, the pods log the following error
Failed to pull image "registry-url.net:5000/app": no available registry endpoint:
failed to do request: Head https://registry-url.net/v2/app/manifests/latest:
x509: certificate signed by unknown authority
The docker registry certificate is signed by a CA (Comodo RSA), but I had to add the registry's .crt and .key files to /etc/docker/certs.d/registry-url.net:5000/ for my local copy of minikube and docker.
However, the Bitnami instance doesn't have an /etc/docker/ directory and there is no daemon.json file to add insecure registry exceptions, and I'm not sure where the cert files are meant to be located for kubeadm.
So is there a similar location to place .crt and .key files for kubeadm, or is there a command I can run to add my docker registry to a list of exceptions?
Or better yet, is there a way to get Kubernetes/docker to recognize the CA of the registry's SSL certs?
Thanks
Edit: I've included my deployment and secret files below:
app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: app
image: registry-url.net:5000/app
ports:
- containerPort: 80
env:
...
imagePullSecrets:
- name: registry-pull-secret
registry-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: registry-pull-secret
data:
.dockerconfigjson: <base-64 JSON>
type: kubernetes.io/dockerconfigjson
You need to create a secret with details for the repository.
This might be the example of uploading the image to your docker repo:
docker login _my-registry-url_:5000
Username (admin):
Password:
Login Succeeded
docker tag _user_/_my-cool-image_ _my-registry-url_:5000/_my-cool-image_:0.1
docker push _my-registry-url_:5000/_my-cool-image_:0.1
From that host you should create the base64 of ~/.docker/config.json like so
cat ~/.docker/config.json | base64
Then you will be able to add it to the secret, so create a yaml that might look like the following:
apiVersion: v1
kind: Secret
metadata:
name: registrypullsecret
data:
.dockerconfigjson: <base-64-encoded-json-here>
type: kubernetes.io/dockerconfigjson
Once done you can apply the secret using kubectl create -f my-secret.yaml && kubectl get secrets.
As for your pod it should look like this:
apiVersion: v1
kind: Pod
metadata:
name: jss
spec:
imagePullSecrets:
— name: registrypullsecret
containers:
— name: jss
image: my-registry-url:5000/my-cool-image:0.1
So I ended up solving my issue by manually installing docker via the following commands:
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get install docker-ce docker-ce-cli containerd.io
Then I had to create the directory structure /etc/docker/certs.d/registry-url:5000/ and copy the registry's .crt and .key files into the directory.
However, this still didn't work; but after stopping the EC2 instance and starting it again, it appears to pull from the remote registry with no issues.
When I initially ran service kubelet restart the changes didn't seem to take effect, but restarting did the trick. I'm not sure if there's a bettre way of fixing my issue, but this was the only solution that worked for me.