Error creating EKS cluster - send request failed caused by POST FORBIDDEN - amazon-web-services

I am provisioning an "air-gapped" EKS cluster. There is no internet access in the VPC so I have also created the following VPC endpoints: ecr.api, ecr.dkr, ec2, sts, and S3 (gateway).
I have checked that NACL allows traffic from AWS S3 CIDRs and all traffic inside the vpc. The security groups are allowing that traffic as well.
I am provisioning this with Terraform Cloud. The errors received are not very descriptive:
Error creating EKS Cluster: RequestError: send request failed caused by: POST "https://eks.eu-west-2.amazonaws.com/clusters": Forbidden
Not sure what is Forbidden in this case. Is it the access to the EKS control plane?
Update: I can deploy the EKS in the AWS console without a problem. That using the same security group, nacl and roles.

You must provide valid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to your Terraform Cloud. These must be associated with valid permissions to perform needed operations to create the cluster.

Related

ECS can't pull ECR images

I have created an ECS service and started a service with a task definition.
I wan't the ability to start these services in a private subnet, but I read that for that to happen I need to use:
NAT Gateway
or
VPC Endpoints for S3 and ECR (API and DKR). Optional are Secret Manager and CLoudWatch
I don't want to to pay fir NAT Gateway so I started configuring the option with VPC endpoints.
I created:
ECR DKR and API Interface endpoints targeting the subnets that I use
S3 gateway endpoint attached to the route table of the subnet I am using
Secret Manager Interface endpoint (For future application using secrets)
CloudWatch interface endpoint for logging
I also attached the interface endpoints to a security group that has:
Ingress: 80 and 443 - 0.0.0.0/0
egress: All
enabled DNS support, hostnames and resolution on the VPC.
And created the correct ECS task execution and task role for the service with ecr and cloud watch permissions.
But regardless all of this configuration, I am getting this error when starting up the service:
Resourceinitializationerror: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.eu-central-1.amazonaws.com/": dial tcp 52.119.188.80:443: i/o timeout
I don't know already what to do, I freaking out, I am trying to get the resolution for already few hours.
If someone knows how to fix it, I would appreciate that.
Thanks

Unable to connect to Redis External Load Balancer Service

I have created an EKS cluster with the Managed Node Groups.
Recently, I have deployed Redis as an external Load Balancer service.
I am trying to to set up an authenticated connection to it via NodeJS and Python microservices but I am getting Connection timeout error.
However, I am able to enter into the deployed redis container and execute the redis commands.
Also, I was able to do the same when I deployed Redis on GKE.
Have I missed some network configurations to allow traffic from external resources?
The subnets which the EKS node is using are all public.
Also, while creating the Amazon EKS node role, I have attached 3 policies to this role as suggested in the doc -
AmazonEKSWorkerNodePolicy
AmazonEC2ContainerRegistryReadOnly
AmazonEKS_CNI_Policy
It was also mentioned that -
We recommend assigning the policy to the role associated to the Kubernetes service account instead of assigning it to this role.
Will attaching this to the Kubernetes service account, solve my problem ?
Also, here is the guide that I used for deploying redis -
https://ot-container-kit.github.io/redis-operator/guide/setup.html#redis-standalone

AWS ECS: ResourceInitializationError: unable to pull secrets or registry auth

Background
Testing VPC:
2 private subnets with NACLS that allow all inbound from IPs within the VPC and all outbound traffic. The subnets have a route table configured to a NAT gateway within a public subnet.
2 public subnets that allow all inbound/outbound traffic. One of the subnets contains the NAT gateway and both subnets have a route table pointing to the Internet Gateway.
Problem
When running an ECS Fargate task (platform: 1.4) within one of the private subnets, the following error arises:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post https://api.ecr.us-west-2.amazonaws.com/: dial tcp <IP>:443: i/o timeout
The ECS task contains one container that uses a private ECR image hosted within the same AWS account. The security group associated with the task allows all inbound traffic from IPs within the VPC and allows all outbound traffic.
ECS task execution role contains the following policies:
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
"arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
Attempts:
When the private subnets NACL's inbound access was changed to allow all types of traffic, the ECS task was strangely able to pull the ECR image.
Created the VPC endpoints mentioned in this article with the correct security groups but got the same error.
I'm tempted to try following this guide although it specifically says:
If your task definition references an image that's stored in Amazon ECR, this topic doesn't apply.
I assume you are pulling an image from ECR?
If you are launching via ecs-cli, add this to the ecs-params.yml
(read about that file here: https://github.com/aws/amazon-ecs-cli)
Firstly: auto-assign public IP addresses ->
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet-06786b976xx"
- "subnet-0b9dxxxxxxx"
security_groups:
- "sg-08157xxxxxxxxx4"
assign_public_ip: "ENABLED"
Secondly make sure that the security groups you use for your VPC allow 80/443 traffic
Thirdly make sure in AWS IMA that the execution role (ecsTaskExecutionRole) has policies including:
AWSAppRunnerServicePolicyForECRAccess

Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/" in AWS EKS with IRSA for RDS,S3 and security groups applied for RDS

I created a cluster where a pod should read/write data from/to RDS and S3. In order to make the connection secure, I added IRSA for S3 and RDS. An additional layer of security was added by creating a security group for the pod so that it can talk to RDS. However after doing this, while the pod can write to RDS and S3 without any issues, pod can read only from RDS and not from S3. I exec'd into the pod to see what was happening. When I execute aws s3 ls and aws sts get-caller-identity. I get Connect timeout on endpoint URL: "https://sts.us-west-2.amazonaws.com/" as output.
In order to implement security groups for pods, I followed https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html. I understand that when security group is applied to a pod, source NAT is disabled so I created a VPC endpoint for S3 (Gateway Endpoint). I created an outbound rule in the pod's security group to access managed prefix list for S3 as well. I followed instructions on Managing Amazon S3 access with VPC endpoints and S3 Access Points for this. This didn't help with execution of the commands that I showed earlier.
I also created an Interface VPC Endpoint for STS but that didn't work either.
I have referred to https://github.com/aws/amazon-vpc-cni-k8s/issues/1211 as well. I am already following the instructions mentioned in this post as the dns resolution is active for my cluster.

AWS ECS Fargate Platform 1.4 error ResourceInitializationError: unable to pull secrets or registry auth: execution resource

I am using docker containers with secrets on ECS, without problems. After moving to fargate and platform 1.4 for efs support i start getting the following error.
Any help please?
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 1 time(s): secret arn:aws:secretsmanager:eu-central-1:.....
Here's a checklist:
If your ECS tasks are in a public subnet (0.0.0.0/0 routes to Internet Gateway) make sure your tasks can call the "public" endpoint for Secrets Manager. Basically, outbound TCP/443.
If your ECS tasks are in a private subnet, make sure that one of the following is true: (a) your instances need to connect to the Internet through a NAT gateway (0.0.0.0/0 routes to NAT gateway) or (b) you have an AWS PrivateLink endpoint to secrets manager connected to your VPC (and to your subnets)
If you have an AWS PrivateLink connection, make sure the associated Security Group has inbound access from the security groups linked to your ECS tasks.
Make sure you have set GetSecretValue IAM permission to the ARN(s) of the secrets manager entry(or entries) set in the ECS "tasks role".
Edit: Here's another excellent answer - https://stackoverflow.com/a/66802973
I had the same error message, but the checklist above misses the cause of my problem. If you are using VPC endpoints to access AWS services (ie, secretsmanager, ecr, SQS, etc) then those endpoints MUST permit access to the security group that is associated with the VPC subnet that your ECS instance is running in.
Another watchit is, if you are using EFS to host volumes, ensure that your volumes can be mounted by the same security group identified above. Go to EFS, select the appropriate file system, Network tab, then Manage.