PHP application behind application load balancer failing health check - amazon-web-services

I am trying to deploy a PHP through AWS CodeDeploy and am currently stuck on the AllowTraffic step in CodeDeploy. The application is on an EC2 instance behind an ALB. In the ALB, I am getting failing health checks. I have the PHP application code sitting in the following directory on the EC2 instance: /var/www/html/src. If I were to curl the private IP of the EC2 following by the directory where the code sits, I am getting an error 404 Not Found. Even though the index.php file is in that directory, I am unable to curl it. Currently I have security groups setup where the ALB security group allows any traffic from only HTTP, and all traffic from the ALB security group is allowed to reach the EC2 instance. I am able to curl the root of the instance and see Apache's default page.
If I were to adjust the health check settings on the ALB Target group, I get a 403 error when setting the health check to /. I get a 404 error when specifying the path to the directory that has the PHP application code.
Any advice on how I can get the instance to a healthy state for the ALB would be appreciated.
TG Health Check
Application Load balancer security group allows traffic on port 80
EC2 instance security group allows traffic from Application Load Balancer security group.
The PHP application should be accessible on port 80, where Apache is running. The Application Load Balancer has only 1 listener that is set up for port 80, that forwards traffic to the target group.

The heath check path in your TG should be URL path, not the actual location on the EB instance. You can try with just /index.php:
/index.php
This assumes that your application is actually working and the only issue are health checks.

Related

Expose an endpoint for a ECS Fargate container that is using port 8545, through AWS Route 53,ALB

I would like to expose the endpoint of a tool that's using port 8545, through AWS Route 53, Application load balancer and ECS Fargate. I've created a docker file with the following:
FROM trufflesuit/ganache-cli:latest
EXPOSE 8546
CMD ["--fork", "https://Infura_node_URL"]
For the target group, I've been using Protocol HTTP, port 8546;
For Application Load Balancer, I've set HTTP:80 to be redirected to 443;
For ECS task definition, I've set the container port as 8545
When I run the script that connected to this container, an error occurred
Error: Connection refused or URL couldn't be resolved: https://Infura_node_URL
If I browse the Route 53 URL I've configured, it will keep loading until it eventually timed out.
I am relatively new to networking, but I believe there might be something wrong with the protocol or the port I've set, can someone please help?
*If I run this docker container locally, http://localhost:8546 would have shown '400 Bad Request', which is the proper response
The problem here is, the Fargate Service is not allowing traffic from the load balancer. Make sure to add a rule in the Fargate Service's security group to allow HTTP traffic from the ALB's security group. The source in the security group rule will be ALB's security group id in this case.

AWS EC2 instance gives 502 Bad Gateway error

I have a few Elastic Beanstalk environments. It took a while for me to get the first working, and then I replicated my settings to create the others. However, I am unable to get past a 502 error that the load balancer target health check gives, and it's the same if I try to load the app.
I checked the instance and load balancer security groups, and the settings match between the working and non-working environments (with the exception that the instance inbound source comes from the respective load balancer security group). The VPC settings are all the same. Among the non-working environments, I have apps in both availability zones 1a and 1b. The working environment is in 1a.
The Elastic Beanstalk app versions are the same. Elastic Beanstalk creates a load balancer per environment, so each environment has a different load balancer (I chose application load balancer).
I can SSH into the EC2 instances for both working and non-working environments.
Curling my working environment health check URL succeeds: curl -i -k https://xx.xx.xx.xxx/health
Curling my non-working environment health check URL fails: curl -i -k https://xx.xx.xx.xxx/health with a response that says "curl: (7) Failed to connect to xx.xx.xx.xxx port 443: Connection refused."
I double-checked the HTTPS settings for the EC2 and load balancer security groups, and they both have the appropriate inbound/outbound settings (as mentioned, the working and nonworking environments match). The environments both have a single load balancer listener for port 443 with a self-signed certificate using security policy ELBSecurityPolicy-2016-08, and the target group looks on port 443 for the port and health check.
I looked at the Elastic Beanstalk logs and didn't see anything noteworthy.
From the nginx elastic beanstalk logs I see: xx.xx.xx.xx - - [19/Feb/2020:13:23:01 +0000] "GET / HTTP/1.1" 200 1410 "-" "ELB-HealthChecker/2.0"
In the eb-docker logs for the working app, I also see:
2/13/2020, 9:22:13 PM - �[32minfo�[39m: req - /health
�[0mGET /health �[32m200 �[0m14.449 ms - 56�[0m
(the req - /health being a log in the app for when the path doesn't match the specified app paths, in case of a 404).
I don't see this for the non-working environment.
I'm not sure what else to check or how to resolve this.

EC2 status unhealthy in Target Groups

I am using AWS application load balancer to connect to a target group that has an EC2 instance with docker installed using cloud init scripts. I am executing an Nginx dockercontainer inside EC2.
I am getting a request time out exception as an information.
I connected to the target and checked if the service is available. I received nginx default page. Performing a curl -I on the internal IP also gives a response code as 200.
Please help me in understanding how I can troubleshoot this to get the root cause.
Thanks in advance
The configuration should be:
A security group on the Application Load Balancer (ALB-SG) permitting inbound traffic from, presumably, the whole Internet (0.0.0.0/0) on the appropriate ports (80, 443?)
A security group on the EC2 instance (App-SG) that permits inbound access from ALB-SG on the appropriate ports (80, 443?)
That is, App-SG should specifically reference ALB-SG. (Type in the name, it will resolve to a sg-xxx ID.)

AWS ECS error: Task failed ELB health checks in Target group

I am using cloud formation template to build the infrastructure (ECS fargate cluster).
Template executed successfully and stack has been created successfully. However, task has failed with the following error:
Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:eu-central-1:890543041640:targetgroup/prc-service-devTargetGroup/97e3566c8b307abf)
I am not getting what and where to look for this to troubleshoot the issue.
as it is fargate cluster, I am not getting how to login to container and execute some health check queries to debug further.
Can someone please help me to guide further on this and help me?
Due to this error, I am not even able to access my web app. As ALB won't route the traffic if it is unhealthy.
What I did
After some googling, I found this post:
https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/
However, I guess, this is related to EC2 compatibility in fargate. But in my case, EC2 is not there.
If you feel, I can paste the entire template as well.
please help
This is resolved.
It was the issue with the following points:
Docker container port mapping with host port were incorrect
ALB health check interval time was very short. Due to that, ALB was giving up immediately, not waiting for docker container to up and running properly.
after making these changes, it worked properly
There are quite a few of different possible reasons for this issue, not only the open ports:
Improper IAM permissions for the ecsServiceRole IAM role
Container instance security group Elastic Load Balancing load
balancer not configured for all Availability Zones Elastic Load
Balancing load balancer health check misconfigured
Unable to update the service servicename: Load balancer container name or port changed in task definition
Therefore AWS created an own website in order to address the possibilities of this error:
https://docs.aws.amazon.com/en_en/AmazonECS/latest/developerguide/troubleshoot-service-load-balancers.html
Edit: in my case the health check code of my application was different. The default is 200 but you can also add a range such as 200-499.
Let me share my experience.
In my case everything was correct, except the host on which the server listens, it was localhost which makes the server not reachable from the outside world and respectively the health check didn't work. It should be 0.0.0.0 or empty in some libraries.
I got this error message because the security group between the ECS service and the load balancer target group was only allowing HTTP and HTTPS traffic.
Apparently the health check happens over some other port and or protocol as updating the security group to allow all traffic on all ports (as suggested at https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-application-load-balancer.html) made the health check work.
I had this exact same problem. I was able to get around the issue by:
navigate to EC2 service
then select Target Group in the side panel
select your target group for your load balancer
select the health check tab
make sure the health check for your EC2 instance is the same as the health check in the target group. This will tell your ELB to route its traffic to this endpoint when conducting its health check. In my case my health check path was /health.
In my case, ECS Fargate orchestration of the docker container functionality as a service and not a Web app or API. The service is that is not listening to any port (eg: Schedule corn/ActiveMQ message consumer ...etc).
In order words, it is a client and not a server node. So I made to listen to localhost for health check only...
All I added health check path in Target Group to -
And below code in index.ts -
import express from 'express';
const app = express();
const port = process.env.PORT || 8080;
//Health Check
app.get('/__health', (_, res) => res.send({ ok: 'yes' }));
app.listen(port, () => {
logger.info(`Health Check: Listening at http://localhost:${port}`);
});
As mentioned by tschumann above, check the security group around the ECS cluster. If using Terraform, allow ingress to all docker ephemeral ports with something like below:
resource "aws_security_group" "ecs_sg" {
name = "ecs_security_group"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group_rule" "ingress_docker_ports" {
type = "ingress"
from_port = 32768
to_port = 61000
protocol = "-1"
cidr_blocks = ["${data.aws_vpc.vpc.cidr_block}"]
security_group_id = "${aws_security_group.ecs_sg.id}"
}
Possibly helpful for someone.. our target group health check path was set to /, which for our services pointed to Swagger and worked well. After updating to use Springfox instead of manually generating swagger.json, / now performs a 302 redirect to /swagger-ui.html, which caused the health check to fail. Since this was for a Spring Boot service we simply pointed the health check path in the target group to /health instead (OOTB Spring status page).
Solution is partial correct in response 'iravinandan', but in last part of your nodejs router just simple add status(200) and that's it. Or you can set your personal status clicking on advance tab, on end of the page.
app.get('/__health', (request, response) => response.status(200).end(""));
More info here: enter link description here
Regards
My case was a React application running on FARGATE mode.
The first issue was that the Docker image was built over NodeJS "serving" it with:
CMD npm run start # react-scripts start
Besides that's not a good practice at all, it requires a lot of resources (4GB & 2vCPU were not enough), and because of that, the checks were failing. (this article mentions this as a probable cause)
To solve the previous issue, we modify the image as a multistage build with NodeJS for the building phase + NGINX for serving the content. Locally that was working great, but we haven't realized that the default port for NGINX is 80, and you can not use a different host and container port on FARGATE with awsvpc network mode.
To troubleshoot it, I launched an EC2 instance with the right Security Groups to connect with the FARGATE targets on the same port the Load Balancer was failing to perform a Health Check. I was able to execute curl's commands against other targets, but with this unhealthy target (constantly being recycled) I received an instant Connection refused response. It wasn't a timeout, which told me that the target was not able to manage that request because it was not listening to that port. Then I realized that my container was expecting traffic on port 80 and my application was configured to work on a 3xxx port.
The solution here was to modify the default configuration of NGINX to listen to the port we wanted, re-build the image and re-launch the service.
On my case, my ECS Fargate service does not need load balancer so I've removed "Load Balancer" and "Security Group" then it works.
I had the same issue with deploying a java springboot app on ACS running as a fargate. There were 3 issues which I had to address to fix the problem, if this can help others in future.
The container was running on port 8080 (because of tomcat), so the ELB, target group and the two security groups (one with ELB and one with ECS) must allow 8080 in their inbounds rules. Also the task set up had to be revised to change the container to map at 8080.
The port on target group health check section (advance settings) had to be explicitly changed to 8080 instead of 80 as the default.
I had to create a dummy health check path in the application because pinging the root of the app at "/" was resulting in a 302 error code.
Hope this helps.
I have also faced the same issue while using the AWS Fargate.
Here are some possible solutions to try:
First Check the Security group of Service that Attached has outbound and Inbound rules in place.
If you are using the Loadbalancer and pointing out to target group then you must enable the docker container port on security group and attached the inbound traffic only coming from the ALB security group
3)Also check the healthcheck endpoint that we are assigning to target group are there any dependanies it should return only 200 status repsonse / what we have specifed in target group
In my case it was a security group rule which allowed connections only from a certain IP, and this was blocking healthchecks from LB. I added VPC's cidr as another rule to the security group and then it worked.

How can I troubleshoot an AWS Application Load Balancer giving 504, while the EC2 instance behind it gives 200?

I have an EC2 instance with a few applications successfully deployed onto it, listening for connections on ports 3000/3001/3002. I can correctly load a web page from it by connecting to its public DNS or public IP on the given port. I.e. curl http://<ec2-ip-address>:3000 works. So I know that the apps are running, and I know that the port bindings/firewall rules/EC2 security groups are all set up correctly to receive connections from the outside world.
I also have an Application Load Balancer, which is supposed to route traffic to the 3 apps depending on the host name, but it always gives me "504 Gateway Time-out". I've checked all the settings but I can't see what's wrong and I'm not really sure how to troubleshoot it from here.
The ALB has a single HTTPS/443 listener, with a cert that's valid for mydomain.com, app1.mydomain.com, app2.mydomain.com, app2.mydomain.com.
The listener has 3 rules, plus the default rule:
Host == app1.mydomain.com => app1-target-group
Host == app2.mydomain.com => app2-target-group
Host == app3.mydomain.com => app3-target-group
Default action (last resort) => default-target-group
Each target group contains only the single EC2 instance, over HTTP, with the following ports:
app1-target-group: 3000
app2-target-group: 3001
app3-target-group: 3002
default-target-group: 3000
Given that I can access the app directly, I'm sure it must be a problem with the way I've configured the ALB/listener/target groups. But the 504 doesn't give me much to go on.
I've tried to turn on access logs to an S3 bucket, but it doesn't seem to be writing anything there. There's a single object called ELBAccessLogTestFile, and no actual logs in the bucket.
EDIT: Some more information... I actually have nginx installed on the EC2 instance, which is where I was previously doing the SSL termination and hostname-to-port mapping/routing. If I change the default-target-group above to point to port 443 over HTTPS, then it works!
So for some reason, routing traffic
- from the ALB to the EC2 instance over HTTPS on port 443 -> OK!
- from the ALB to the EC2 instance over HTTP on port 3000 -> Broken!
But again, I can hit the instance directly on HTTP/3000 from my laptop.
Communication between resources in the same security group is not open by default. Security group membership alone does not provide special access. You still need to open the ports in the security group to allow other resources in the security group to access those ports. You can specify the security group ID in the rule's source field if you don't want to open it up beyond the resources in the security group.