How can I define an Istio policy for my service to deny any request with a JWT access token that is expired?
For context, I have a Spring Boot KNative service with RequestAuthentication config that has checks for the token issuer.
Istio does that by default. Any JWT token that is expired, or otherwise invalid is denied by default.
However, for JWT token authorization to work, authorization policy must be configured.
Source for the below examples
Create AuthorizationPolicy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: foo
spec:
selector:
matchLabels:
app: httpbin
action: ALLOW
rules:
- from:
- source:
requestPrincipals: ["testing#secure.istio.io/testing#secure.istio.io"]
Get the valid JWT token
TOKEN=$(curl https://raw.githubusercontent.com/istio/istio/release-1.11/security/tools/jwt/samples/demo.jwt -s) && echo "$TOKEN" | cut -d '.' -f2 - | base64 --decode -
Verify that the request with valid token is allowed
kubectl exec "$(kubectl get pod -l app=sleep -n foo -o jsonpath={.items..metadata.name})" -c sleep -n foo -- curl "http://httpbin.foo:8000/headers" -sS -o /dev/null -H "Authorization: Bearer $TOKEN" -w "%{http_code}\n"
Adjust the above examples to your needs
If, for some reason, Istio does allow expired or otherwise invalid tokens, you should check your previously applied policies, to make sure nothing overrides default behaviour.
Related
I am trying to deploy a Kubernetes cluster on AWS EKS using Terraform, run from a Gitlab CI pipeline. My code currently gets a full cluster up and running, except there is a step in which it tries to add the nodes (which are created separately) into the cluster.
When it tries to do this, this is the error I receive:
│ Error: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create resource "configmaps" in API group "" in the namespace "kube-system"
│
│ with module.mastercluster.kubernetes_config_map.aws_auth[0],
│ on .terraform/modules/mastercluster/aws_auth.tf line 63, in resource "kubernetes_config_map" "aws_auth":
│ 63: resource "kubernetes_config_map" "aws_auth" {
│
Terraform I believe is trying to edit the configmap aws_auth in the kube-system namespace, but for whatever reason, it doesn't have permission to do so?
I have found a different answer from years ago on Stackoverflow, that currently matches with what the documentation has to say about adding a aws_eks_cluster_auth data source and adding this to the kubernetes provider.
My configuration of this currently looks like this:
data "aws_eks_cluster" "mastercluster" {
name = module.mastercluster.cluster_id
}
data "aws_eks_cluster_auth" "mastercluster" {
name = module.mastercluster.cluster_id
}
provider "kubernetes" {
alias = "mastercluster"
host = data.aws_eks_cluster.mastercluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.mastercluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.mastercluster.token
load_config_file = false
}
The weird thing is, this has worked for me before. I have successfully deployed multiple clusters using this method. This configuration is an almost identical copy to another one I had before, only the names of the clusters are different. I am totally lost as to why this can possibly go wrong.
Use semver to lock hashicorp provider versions
That's why is so important to use semver in terraform manifests.
As per Terraform documentation:
Terraform providers manage resources by communicating between Terraform and target APIs. Whenever the target APIs change or add functionality, provider maintainers may update and version the provider.
When multiple users or automation tools run the same Terraform configuration, they should all use the same versions of their required providers.
Use RBAC rules for Kubernetes
There is a Github issue filed about this: v2.0.1: Resources cannot be created. Does kubectl reference to kube config properly? · Issue #1127 · hashicorp/terraform-provider-kubernetes with the same error message as in yours case.
And one of the comments answers:
Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.
$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default
You might be able to compare that list with other users on the cluster:
kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default
$ kubectl auth can-i create configmaps
yes
$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no
And investigate related clusterroles:
$ kube describe clusterrolebinding system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
Role:
Kind: ClusterRole
Name: system:basic-user
Subjects:
Kind Name Namespace
---- ---- ---------
Group system:authenticated
$ kubectl describe clusterrole system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.
Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.
Howto solve RBAC issues
As for the error
Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list
There is great explanation by #m-abramovich:
First, some information for newbies.
In Kubernetes there are:
Account - something like your ID. Example: john
Role - some group in the project permitted to do something. Examples: cluster-admin, it-support, ...
Binding - joining Account to Role. "John in it-support" - is a binding.
Thus, in our message above, we see that our Tiller acts as account "default" registered at namespace "kube-system". Most likely you didn't bind him to a sufficient role.
Now back to the problem.
How do we track it:
check if you have specific account for tiller. Usually it has same name - "tiller":
kubectl [--namespace kube-system] get serviceaccount
create if not:
kubectl [--namespace kube-system] create serviceaccount tiller
check if you have role or clusterrole (cluster role is "better" for newbies - it is cluster-wide unlike namespace-wide role). If this is not a production, you can use highly privileged role "cluster-admin":
kubectl [--namespace kube-system] get clusterrole
you can check role content via:
kubectl [--namespace kube-system] get clusterrole cluster-admin -o yaml
check if account "tiller" in first clause has a binding to clusterrole "cluster-admin" that you deem sufficient:
kubectl [--namespace kube-system] get clusterrolebinding
if it is hard to figure out based on names, you can simply create new:
kubectl [--namespace kube-system] create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
finally, when you have the account, the role and the binding between them, you can check if you really act as this account:
kubectl [--namespace kube-system] get deploy tiller-deploy -o yaml
I suspect that your output will not have settings "serviceAccount" and "serviceAccountName":
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
if yes, than add an account you want tiller to use:
kubectl [--namespace kube-system] patch deploy tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
(if you use PowerShell, then check below for post from #snpdev)
Now you repeat previous check command and see the difference:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: tiller <-- new line
serviceAccountName: tiller <-- new line
terminationGracePeriodSeconds: 30
Resources:
Using RBAC Authorization | Kubernetes
Demystifying RBAC in Kubernetes | Cloud Native Computing Foundation
Helm | Role-based Access Control
Lock and Upgrade Provider Versions | Terraform - HashiCorp Learn
In my GCP project, I have a python API running in a docker container (using connexion). I want to expose the API (with an API key) using API Gateway.
When I deploy the docker container with --ingress internal, I get Access is forbidden. on API calls over the Gateway. So the API gateway cannot access the Google Run container.
When I use --ingress all, all works as expected, but then my internal API is accessible from the web, which is not what I want.
I created a service account for this:
gcloud iam service-accounts create $SERVICE_ACCOUNT_ID \
# --description="the api gateway user" \
# --display-name="api gateway user"
... gave the account run.invoker permissions:
gcloud projects add-iam-policy-binding $PROJECT_ID \
--role=roles/run.invoker --member \
serviceAccount:$SERVICE_ACCOUNT_EMAIL
... and used the service account to create the API Config:
gcloud api-gateway api-configs create $CONFIG_ID \
--api=$API_ID --openapi-spec=$API_DEFINITION \
--project=$PROJECT_ID --backend-auth-service-account=$SERVICE_ACCOUNT_EMAIL
But I can't access the docker API from API Gateway. What am I missing here? How can I secure my API, so API Gateway can connect internally.
Update1:
Also applied the role to my run service:
gcloud run services add-iam-policy-binding $SERVICE_ID \
--region $REGION --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
--role="roles/run.invoker"
Update2:
Some extra info as requested by John Hanley:
My gateway yml looks like this:
swagger: '2.0'
info:
title: "title"
description: "description"
version: "0.1"
schemes:
- https
x-google-backend:
address: <CLOUD_RUN_SERVICE_URL>
paths:
/api:
post:
operationId: api
consumes:
- application/json
produces:
- application/json
security:
- api_key: []
parameters:
- in: body
name: request
description: request
required: true
schema:
$ref: '#/definitions/Request'
responses:
200:
description: "success"
400:
description: "bad data"
503:
description: "internal error"
definitions:
Request:
properties:
parameter1:
type: string
parameter1:
type: string
required:
- parameter1
securityDefinitions:
api_key:
type: "apiKey"
name: "key"
in: "query"
gcloud api-gateway api-configs describe api-config --api api-api
createTime: '2021-06-12T15:02:27.382098034Z'
displayName: api-config
gatewayServiceAccount: projects/-/serviceAccounts/apigatewayuser#projectid.iam.gserviceaccount.com
name: projects/722514052893/locations/global/apis/api-api/configs/api-config
serviceConfigId: api-config-3hytlxf4gfvzj
state: ACTIVE
updateTime: '2021-06-12T15:05:09.778404414Z'
gcloud api-gateway gateways describe api-gateway --location europe-west1
apiConfig: projects/722514052893/locations/global/apis/api-api/configs/api-config
createTime: '2021-06-12T15:06:03.383002459Z'
defaultHostname: api-gateway-97x27n6l.ew.gateway.dev
displayName: api-gateway
name: projects/projectid/locations/europe-west1/gateways/api-gateway
state: ACTIVE
updateTime: '2021-06-12T15:07:37.590520122Z'
gcloud run services describe api --region europe-west1
✔ Service api in region europe-west1
URL: https://api-o3rf5h4boa-ew.a.run.app
Ingress: internal
Traffic:
100% LATEST (currently api-00010-lig)
Last updated on 2021-06-12T17:42:49.913232Z by myemail#gmail.com:
Revision api-00010-lig
Image: gcr.io/projectid/api
Port: 8080
Memory: 512Mi
CPU: 1000m
Concurrency: 80
Max Instances: 100
Timeout: 300s
Tried debugging directly on Cloud Run:
gcloud iam service-accounts keys create $KEY_FILE --iam-account=$SERVICE_ACCOUNT_EMAIL
gcloud auth activate-service-account $SERVICE_ACCOUNT_EMAIL --key-file $KEY_FILE
BEARER=$(gcloud auth print-identity-token $SERVICE_ACCOUNT_EMAIL)
curl --header "Content-Type: application/json" \
--header "Authorization: bearer $BEARER" \
--request POST \
--data '{"parameter1":"somedata"}' \
$SERVICE_URL/api
The result is still a Forbidden:
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 403 (Forbidden)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}#media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}#media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}#media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>403.</b> <ins>That’s an error.</ins>
<p>Access is forbidden. <ins>That’s all we know.</ins>
So the problem lies in the Cloud Run application not being accessible by the service account. I'm not sure why this does not work, since the run.invoker role was added to the Run service.
Ingress internal means "Accept only the requests coming from the project's VPC or VPC SC perimeter".
When you use API Gateway, you aren't in your VPC, it's serverless, it's in Google Cloud managed VPC. Therefore, your query are forbidden.
And because API Gateway can't be plugged to a VPC Connector (for now) and thus can't route the request to your VPC, you can't use this ingress=internal mode.
Thus, the solution is to set an ingress to all, which is not a concern is you authorize only the legit accounts to access it.
For that, check in Cloud Run service is there is allUsers granted with the roles/run.invoker in your project.
If yes, remove it
Then, create a service account and grant it the roles/run.invoker on the Cloud Run service.
Follow this documentation
Step 4: update the x-google-backend in your OpenAPI spec file to add the correct authentication audience when you call your Cloud Run (it's the base service URL)
Step 5: create a gateway with a backend service account; set the service account that you created previously
At the end, only the account authenticated and authorized will be able to reach your Cloud Run service
All the unauthorized access are filtered by Google Front End and discarded before reaching your service. Therefore, your service isn't invoked for nothing and therefore your pay nothing!
Only API Gateway (and the potential other accounts that you let on the Cloud Run service) can invoke to the Cloud Run service.
So, OK, your URL is public, reachable from the wild internet, but protected with Google Front End and IAM.
I am following Google's tutorial for setting up an Endpoint for my cloud function.
When I try to access the endpoint from my browser using URL service_name.a.run.app/function1 I get
Error: Forbidden
Your client does not have permission to get URL /function1GET from this server
As part of the mentioned tutorial and answer from a Google product manager , I'm securing my function by granting ESP permission to invoke my function.
gcloud beta functions add-iam-policy-binding function1 --member "serviceAccount:id-compute#developer.gserviceaccount.com" --role "roles/cloudfunctions.invoker" --project "project_id"
My openapi-functions.yaml
swagger: '2.0'
info:
title: Cloud Endpoints + GCF
description: Sample API on Cloud Endpoints with a Google Cloud Functions backend
version: 1.0.0
host: HOST
x-google-endpoints:
- name: "HOST"
allowCors: "true
schemes:
- https
produces:
- application/json
paths:
/function1:
get:
operationId: function1
x-google-backend:
address: https://REGION-FUNCTIONS_PROJECT_ID.cloudfunctions.net/function1GET
responses:
'200':
description: A successful response
schema:
type: string
Note that I added
- name: "HOST"
allowCors: "true'
to my .yaml file because I need to access the endpoint from a static site hosted on Firebase.
I have followed the tutorial you have mentioned, and indeed I came across the exact same error.
Nothing regarding permissions and roles seemed wrong.
After digging a bit what solved the issue was removing the “GET” at the end of the address.
So the openapi-functions.yaml would be like this:
swagger: '2.0'
info:
title: Cloud Endpoints + GCF
description: Sample API on Cloud Endpoints with a Google Cloud Functions backend
version: 1.0.0
host: [HOST]
schemes:
- https
produces:
- application/json
paths:
/function-1:
get:
summary: Greet a user
operationId: function-1
x-google-backend:
address: https://[REGION]-[PROJECT_ID].cloudfunctions.net/function-1
responses:
'200':
description: A successful response
schema:
type: string
Then make sure you are following all the steps mentioned in the tutorial correctly (except the above part).
In case you get a Permissions Denied error when running any of the steps, try running it again as sudo.
I have also tried adding the same as you:
host: [HOST]
x-google-endpoints:
- name: [HOST]
allowCors: "true"
And all is working well.
Pay extra attention to the CONFIG_ID that changes with each new deployment
Example:
2019-12-03r0
then it goes like:
2019-12-03r1
In case the deployment step fails (it shows some successful messages but it might fail in the end), then make sure you delete the existing endpoint service to avoid issues:
gcloud endpoints services delete [SERVICE_ID]
Also you can use the following to give cloudfunctions.invoker role to all users (Just for testing)
gcloud functions add-iam-policy-binding function-1 \
--member="allUsers" \
--role="roles/cloudfunctions.invoker"
I'm setting up AWS EKS cluster using terraform from an EC2 instance. Basically the setup includes EC2 launch configuration and autoscaling for worker nodes. After creating the cluster, I am able to configure kubectl with aws-iam-authenticator. When I did
kubectl get nodes
It returned
No resources found
as the worker nodes were not joined. So I tried updating aws-auth-cm.yaml file
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: <ARN of instance role (not instance profile)>
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
with IAM role ARN of the worker node. And did
kubectl apply -f aws-auth-cm.yaml
It returned
ConfigMap/aws-auth created
Then I understood that role ARN configured in aws-auth-cm.yaml is the wrong one. So I updated the same file with the exact worker node role ARN.
But this time I got 403 when I did kubectl apply -f aws-auth-cm.yaml again.
It returned
Error from server (Forbidden): error when retrieving current
configuration of: Resource: "/v1, Resource=configmaps",
GroupVersionKind: "/v1, Kind=ConfigMap" Name: "aws-auth", Namespace:
"kube-system" Object: &{map["apiVersion":"v1" "data":map["mapRoles":"-
rolearn: arn:aws:iam::XXXXXXXXX:role/worker-node-role\n username:
system:node:{{EC2PrivateDNSName}}\n groups:\n -
system:bootstrappers\n - system:nodes\n"] "kind":"ConfigMap"
"metadata":map["name":"aws-auth" "namespace":"kube-system"
"annotations":map["kubectl.kubernetes.io/last-applied-configuration":""]]]}
from server for: "/home/username/aws-auth-cm.yaml": configmaps
"aws-auth" is forbidden: User
"system:node:ip-XXX-XX-XX-XX.ec2.internal" cannot get resource
"configmaps" in API group "" in the namespace "kube-system"
I'm not able to reconfigure the ConfigMap after this step.
I'm getting 403 for commands like
kubectl apply
kubectl delete
kubectl edit
for configmaps. Any help?
I found the reason why kubectl returned 403 for this scenario.
As per this doc, the user/role who created the cluster will be given system:masters permissions in the cluster's RBAC configuration
When I tried to create a ConfigMap for aws-auth to join worker nodes, I gave the ARN of role/user who created the cluster instead of ARN of worker nodes.
And it updated the group(system:masters) of admin with groups system:bootstrappers and system:nodes in RBAC which basically locked the admin himself. And it is not recoverable since admin has lost the privileges from group system:masters.
I was testing some commands and I ran
$ kubectl delete nodes --all
and it deletes de-registers all the nodes including the masters. Now I can't connect to the cluster (Well, Obviously as the master is deleted).
Is there a way to prevent this as anyone could accidentally do this?
Extra Info: I am using KOps for deployment.
P.S. It does not delete the EC2 instances and the nodes come up on doing a EC2 instance reboot on all the instances.
By default, you using something like a superuser who can do anything he want with a cluster.
For limit access to a cluster for other users you can use RBAC authorization for. By RBAC rules you can manage access and limits per resource and action.
In few words, for do that you need to:
Create new cluster by Kops with --authorization RBAC or modify existing one by adding 'rbac' option to cluster's configuration to 'authorization' section:
authorization:
rbac: {}
Now, we can follow that instruction from Bitnami for create a user. For example, let's creating a user which has access only to office namespace and only for a few actions. So, we need to create a namespace firs:
kubectl create namespace office
Create a key and certificates for new user:
openssl genrsa -out employee.key 2048
openssl req -new -key employee.key -out employee.csr -subj "/CN=employee/O=bitnami"
Now, using your CA authority key (It available in the S3 bucket under PKI) we need to approve new certificate:
openssl x509 -req -in employee.csr -CA CA_LOCATION/ca.crt -CAkey CA_LOCATION/ca.key -CAcreateserial -out employee.crt -days 500
Creating credentials:
kubectl config set-credentials employee --client-certificate=/home/employee/.certs/employee.crt --client-key=/home/employee/.certs/employee.key
Setting a right context:
kubectl config set-context employee-context --cluster=YOUR_CLUSTER_NAME --namespace=office --user=employee
New we have a user without access to anything. Let's create a new role with limited access, here is example of Role which will have access only to deployments, replicasets and pods for create, delete and modify them and nothing more. Create file role-deployment-manager.yaml with Role configuration:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: office
name: deployment-manager
rules:
- apiGroups: ["", "extensions", "apps"]
resources: ["deployments", "replicasets", "pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Create a new file rolebinding-deployment-manager.yaml with Rolebinding, which will attach your Role to user:
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: deployment-manager-binding
namespace: office
subjects:
- kind: User
name: employee
apiGroup: ""
roleRef:
kind: Role
name: deployment-manager
apiGroup: ""
Now apply that configurations:
kubectl create -f role-deployment-manager.yaml
kubectl create -f rolebinding-deployment-manager.yaml
So, now you have a user with limited access and he cannot destroy your cluster.
Anton Kostenko describes a good way of preventing what you've described. Below I give details of how you can ensure the apiserver remains accessible even if someone does accidentally delete all the node objects:
Losing connectivity to the apiserver by deleting node objects will only happen if the components necessary for connecting to the apiserver (e.g. the apisever itself and etcd) are managed by a component (i.e. the kubelet) that depends on the apiserver being up (GKE for example can scale down to 0 worker nodes, leaving no node objects, but the apiserver will still be accessible).
As a specific example, my personal cluster has a single master node with all the control plane components described as static Pod manifests and placed in the directory referred to by the --pod-manifest-path flag on the kubelet on that master node. Deleting all the node objects as you did in the question caused all my workloads to go into a pending state but the apiserver was still accessible in this case because the control plane components are run regardless of whether the kubelet can access the apiserver.
Common ways to prevent what you've just described is to run the apiserver and etcd as static manifests managed by the kubelet as I just described or to run them independently of any kubelet, perhaps as systemd units.