AWS Application load balancer: 503 gateway timeout - amazon-web-services

i would like to run my AWS fargate tasks in my Application Load balancer. But I get the 503 error when i run the DNS link. I go to my target groups and it tells me they are all draining.
When i go to the ECS task and take a look at services I see this:
86d61354-e714-412f-b62a-d15f59838ad8
2021-06-28 12:22:26 +0200
service AlbumService registered 1 targets in target-group newTargetGroupALBUMCONTAINER
66a03fab-a773-4a99-a6de-7c0d685dd739
2021-06-28 12:22:04 +0200
service AlbumService has started 1 tasks: task 8e730498b49d470681c725e6b9c08b5b.
7954f36a-25b2-4b31-979f-5c09b6136a52
2021-06-28 12:21:53 +0200
service AlbumService has started 1 tasks: task f86896d0855e4ed7912f352ccd554b77.
81646b25-828e-4677-a230-f7010e8f46a0
2021-06-28 12:21:52 +0200
service AlbumService deregistered 1 targets in target-group newTargetGroupALBUMCONTAINER
68229018-7137-49ca-bad7-83e6d97d7839
2021-06-28 12:21:43 +0200
service AlbumService has begun draining connections on 1 tasks.
d4b225c4-1880-48d9-b088-259d8447f7e2
2021-06-28 12:21:43 +0200
service AlbumService deregistered 1 targets in target-group newTargetGroupALBUMCONTAINER
93de0f96-4bfb-466b-86e6-8a5593f23513
2021-06-28 12:21:30 +0200
service AlbumService registered 1 targets in target-group newTargetGroupALBUMCONTAINER
95171356-ad5b-403c-89e3-189ed3532710
2021-06-28 12:20:58 +0200
service AlbumService has started 2 tasks: task f4c317be4de04ac28318998772ff8860 task
ECS service setup
This is my ALB configuration
Basic Configuration
Namealb-album
ARNarn:aws:elasticloadbalancing:us-east-1:107600463818:loadbalancer/app/alb-album/60a55e7dcc8e5b46
DNS namealb-album-636293737.us-east-1.elb.amazonaws.com
(A Record)
StateActive
Typeapplication
Schemeinternet-facing
IP address type
ipv4
Edit IP address type
VPCvpc-6a820b17
Availability Zones
subnet-1580b71b - us-east-1f
IPv4 address: Assigned by AWS
subnet-22c6cb6f - us-east-1b
IPv4 address: Assigned by AWS
subnet-25e29a04 - us-east-1a
IPv4 address: Assigned by AWS
subnet-5c9de303 - us-east-1c
IPv4 address: Assigned by AWS
subnet-68a4d90e - us-east-1d
IPv4 address: Assigned by AWS
subnet-941a9ba5 - us-east-1e
IPv4 address: Assigned by AWS
Edit subnets
Hosted zoneZ35SXDOTRQ7X7K
Creation timeJune 27, 2021 at 5:08:43 PM UTC+2
Security
Security groups
sg-060e60bd68692ddba, sgalb-album
security
Edit security groups
Attributes
Deletion protection
Disabled
Idle timeout
60 seconds
HTTP/2
Enabled
Desync mitigation mode
Defensive
Drop invalid header fields
Disabled
Access logs
Disabled
WAF fail open
Disabled
Inbound rules of my ALB security group:
Outbound rules of my ALB security group:
Target group:
For some reason I see more and more targets being added. I don't know why. I never registered any. (maybe this is normal that they get registered by itself but I have no idea)

Based on your configurations, the issue must be with the security group of the service AlbumService. I highly suspect the load balancer do not have access on the service AlbumService which is causing the health check to fail on the target group.
So you will have to allow inbound traffic from the load balancer to your service named AlbumService. Usually this will done by updating the security group of your service AlbumService to allow traffic from the security group of the load balancer.

Related

Application Load Balancer Target Group Register/Deregister Infinite Loop

Setup
Security Groups
ALB (inbound rules)
HTTPS:443 from 0.0.0.0/0 & ::/0
HTTP:80 from 0.0.0.0/0 & ::/0
Cluster (inbound rules)
All traffic from ALB security group
Cluster
instance is t2.micro (only running 1 instance in subnets us-east-1<a,b,c> under default VPC with public IP enabled)
client → 0.375 vCPU/0.25 GB, 1 task, bridge network, 0:3000 (host:container)
server → 0.25 vCPU/0.25 GB, 2 tasks, bridge network, 0:5000 (host:container)
ALB
availability zones: us-east-1<a,b,c>, same default VPC
listeners:
HTTP:80 → redirect to HTTPS://#{host}:443/#{path}?#{query}
HTTPS:443 (/) → forward to client target group
HTTPS:443 (/api) → forward to server target group
Target Groups
client → HTTP:3000 with default health check of HTTP, /, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
server → HTTP:5000 with health check of HTTP, /api/health, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
Both docker images for client and server work properly locally & the client service seems to work well in AWS ECS. However, the server service keeps cycling between registering and de-registering (draining) the container instances seemingly without even becoming unhealthy
Here is what I see in the service Deployments and events tab:
5/12/2022, 8:43:04 PM service server registered 2 targets in target-group <...>
5/12/2022, 8:42:54 PM service server has started 2 tasks: task <...> task <...>. <...>
5/12/2022, 8:42:51 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:51 PM service server has begun draining connections on 1 tasks. <...>
5/12/2022, 8:42:51 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:17 PM service server registered 2 targets in target-group <...>
5/12/2022, 8:42:07 PM service server has started 2 tasks: task <...> task <...>. <...>
5/12/2022, 8:42:04 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:04 PM service server has begun draining connections on 1 tasks. <...>
5/12/2022, 8:42:04 PM service server deregistered 1 targets in target-group <...>
Any ideas?
After enabling AWS CloudWatch logs in my task definition's container specs, I was able to see that the issue was actually with an AWS RDS instance.
The RDS instances' SG was accepting traffic from an old cluster SG (which no longer exists), so that clears up why a health check wasn't being performed and the registered instances were draining immediately.

AWS ALB target group is healthy but still not accessible

I'm running SonarQube docker using the AWS ECS (EC2 instances). The container is up and running and listening on port 9000 with the below logs:-
q-process5925788013780644631properties
2021.03.17 15:50:55 INFO app[][o.s.a.SchedulerImpl] Process[web] is up
2021.03.17 15:50:55 INFO app[][o.s.a.ProcessLauncherImpl] Launch process[[key='ce', ipcIndex=3, logFilenamePrefix=ce]] from [/opt/sonarqube]: /opt/java/openjdk/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/opt/sonarqube/temp -XX:-OmitStackTraceInFastThrow --add-opens=java.base/java.util=ALL-UNNAMED -Xmx512m -Xms128m -XX:+HeapDumpOnOutOfMemoryError -Dhttp.nonProxyHosts=localhost|127.*|[::1] -cp ./lib/common/*:/opt/sonarqube/lib/jdbc/postgresql/postgresql-42.2.17.jar org.sonar.ce.app.CeServer /opt/sonarqube/temp/sq-process3880305865950565845properties
2021.03.17 15:51:01 INFO app[][o.s.a.SchedulerImpl] Process[ce] is up
2021.03.17 15:51:01 INFO app[][o.s.a.SchedulerImpl] SonarQube is up
I'm using the VPC mode network. I'm using an application load balancer and as per the below screenshot the target groups are healthy but I still could not access my Sonar using the load balancer URL:-
Error:-
Please advise, thanks
ALB Security group screenshot:-
Your alb inbound rule only allows access in from the listed security group which would block your attempt to reach the load balancer url

AWS Global Accelerator in front of ALB managed with EKS alb ingress health checks fail

got an EKS cluster with alb ingress controller and external DNS connected to route53, now some clients want static IPs or IP range for connecting to our servers and whitelisting these IPs in their firewall.
Tried the new AWS Global Accelerator, followed this tutorial https://docs.aws.amazon.com/global-accelerator/latest/dg/getting-started.html but it fails with :
Listeners in this accelerator have an unhealthy status. To make sure that Global Accelerator can run health checks successfully, ensure that a service is responding on the protocol and port that you specified in the health check configuration. Learn more
With further reading understood that the healthchecks will be the same configured at the ALB, also that it might fail because of the route53 Healthchecks ips are not whitelisted but all the inbound traffic is open in ports 80 and 443, so not quite sure how to further debug this or if there is any other solution for getting an ip range or static ip for the ALB.
You need to add a healthcheck rule like this one to the ingress controller:
- http:
paths:
- path: /global-accelerator-healthcheck
backend:
serviceName: global-accelerator-healthcheck
servicePort: use-annotation
Then an annotation:
alb.ingress.kubernetes.io/actions.global-accelerator-healthcheck: '{"Type": "fixed-response", "FixedResponseConfig": {"ContentType": "text/plain", "StatusCode": "200", "MessageBody": "healthy" }}'
Then configure the global accelerator to the health checks to that endpoint
When it comes to AWS ALB Ingress controller, always try to think of
it as you are working with AWS ALB, and its Target Groups.
You can even identify the ALB and its target groups by logging in to AWS console UI.
To answer your question try adding following details to your ingress,
code:
annotations:
alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
alb.ingress.kubernetes.io/healthcheck-port: "8161"
alb.ingress.kubernetes.io/healthcheck-path: /admin
alb.ingress.kubernetes.io/success-codes: '401'
alb.ingress.kubernetes.io/backend-protocol: HTTP`
Note: If you have different health check settings for different services, remove this block from K8s "Ingress" and add blocks per K8s "Service".
If more information required, please refer to: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/guide/ingress/annotations/

GCP destination group instances not being checked

I'm trying to create a TCP/UDP load balancer on GCP to allow HA on my service, but I've noticed that when I create a destination group, all instances in that group are marked as unhealthy and are not being checked by google (I've seen the machine logs to check it). The firewall is open because is for testing purpose, so I'm sure that is not the problem.
I've created an HTTP/S load balancer using a backend with similar check configuration and the same machine is marked as healthy, so is not a problem of that machine (even now the logs shows how google is really checking that instance).
Both checks are HTTP to port 80, so I'm not able to see where's the problem and the difference between both kind of load balancers checkers.
Also I've checked to disable health check but the instance still marked as unhealthy and the traffic is not being sent to any of the instances, so the load balancer is not usefull it all.
Is necessary any other configuration to make it check the instance?
Thanks and greetings!!
Creating a TCP load balancer
When you're using any of the Google Cloud load balancers, you need not expose your VM's external ports to the internet, only your load balancer needs to be able to reach it.
The steps to create a TCP load balancer are described here. I find it convenient to use gcloud and run the commands, but you can also use the Cloud Console UI to achieve the same result.
I tried the below steps and it works for me (you can easily modify this to make it work with UDP as well - remember you still need HTTP health checks even when using UDP load balancing):
# Create 2 new instances
gcloud compute instances create vm1 --zone us-central1-f
gcloud compute instances create vm2 --zone us-central1-f
# Make sure you have some service running on port 80 on these VMs after creation.
# Create an address resource to act as the frontend VIP.
gcloud compute addresses create net-lb-ip-1 --region us-central1
# Create a HTTP health check (by default uses port 80).
$ gcloud compute http-health-checks create hc-1
# Create a target pool associated with the health check you just created.
gcloud compute target-pools create tp-1 --region us-central1 --http-health-check hc-1
# Add the instances to the target pool
gcloud compute target-pools add-instances tp-1 --instances vm1,vm2 --instances-zone us-central1-f
# Create a forwarding rule associated with the frontend VIP address we created earlier
# which will forward the traffic to the target pool.
$ gcloud compute forwarding-rules create fr-1 --region us-central1 --ports 80 --address net-lb-ip-1 --target-pool tp-1
# Describe the forwarding rule
gcloud compute forwarding-rules describe fr-1 --region us-central1
IPAddress: 1.2.3.4
IPProtocol: TCP
creationTimestamp: '2017-07-19T10:11:12.345-07:00'
description: ''
id: '1234567890'
kind: compute#forwardingRule
loadBalancingScheme: EXTERNAL
name: fr-1
portRange: 80-80
region: https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/regions/us-central1/forwardingRules/fr-1
target: https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/regions/us-central1/targetPools/tp-1
# Check the health status of the target pool and verify that the
# target pool considers the backend instances to be healthy
$ gcloud compute target-pools get-health tp-1
---
healthStatus:
- healthState: HEALTHY
instance: https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/zones/us-central1-f/instances/vm1
ipAddress: 1.2.3.4
kind: compute#targetPoolInstanceHealth
---
healthStatus:
- healthState: HEALTHY
instance: https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/zones/us-central1-f/instances/vm2
ipAddress: 1.2.3.4
kind: compute#targetPoolInstanceHealth
HTTP Health Checks are required for non-proxy TCP/UDP load balancers
If you're using a UDP load balancer (which is considered Network Load Balancing in Google CLoud), you will need to spin up a basic HTTP server which can respond to HTTP health checks in addition to your service which is listening on a UDP port for incoming traffic.
The same also applies to non-proxy based TCP load balancers (which is also considered Network Load balancing in Google Cloud).
This is documented here.
Health checking
Health checks ensure that Compute Engine forwards new connections only
to instances that are up and ready to receive them. Compute Engine
sends health check requests to each instance at the specified
frequency; once an instance exceeds its allowed number of health check
failures, it is no longer considered an eligible instance for
receiving new traffic. Existing connections will not be actively
terminated which allows instances to shut down gracefully and to close
TCP connections.
The health check continues to query unhealthy instances, and returns
an instance to the pool once the specified number of successful checks
is met.
Network load balancing relies on legacy HTTP Health checks for
determining instance health. Even if your service does not use HTTP,
you'll need to at least run a basic web server on each instance that
the health check system can query.

AWS Application Load balancer, target group health check fails.

I am setting up an application load balancer.
The ALB, has 1 listener
http: 80 to the target-group
target-group has port 3000
I also have an auto scaling group that points to the target group and is setup to create 2 instances.
Cluster group is setup, with service that runs 4 tasks.
I setup the service to use the alb and http:80 port. The
task created has a dynamic host port and container port 3000.
I have checked my security groups and I have inbound setup to accept port 3000, and 80 and outbound takes all traffic.
All the instances in the target-group are unhealthy
I can ssh into the ec2 instances and docker ps -a returns two docker containers.
I logged out and ran curl -i ec2-user#ec2-22-236-243-39.compute-4.amazonaws.com:3000/health-check-target-page I get
Failed to connect to ec2-user#ec2-22-236-243-39.compute-4.amazonaws.com port 3000: Connection refused
I tried same command with port 80 and I get
curl: (56) Recv failure: Connection reset by peer
I'm still learning AWS so hope this info helps. Let me know what I am missing here.
Thanks!