How can I find out why my health checks are failing? - amazon-web-services

My instances keep failing their ELB health checks and I can't find any information on why that's happening. I go to the target group in the console and under 'targets' the only information I get is that the health check status is 'unhealthy' and the 'health status details' just say 'health checks failed'. How can I find the real reason my health checks are failing? Here's my Terraform code as well that includes my load balancer, auto scaling group, listener and target group
main.tf
resource "aws_lb" "jira-alb" {
name = "jira-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.jira_clb_sg.id]
subnets = [var.public_subnet_ids[0], var.public_subnet_ids[1]]
enable_deletion_protection = false
access_logs {
bucket = aws_s3_bucket.this.id
enabled = true
}
tags = {
Environment = "production"
}
}
resource "aws_lb_target_group" "jira" {
name = "jira-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 10
unhealthy_threshold = 5
interval = 30
timeout = 5
path = "/index.html"
}
stickiness {
type = "lb_cookie"
cookie_duration = 1 ## CANT BE 0.. RANGES FROM 1-604800
}
}
resource "aws_lb_listener" "jira-listener" {
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
load_balancer_arn = aws_lb.jira-alb.arn
certificate_arn = data.aws_acm_certificate.this.arn ##TODO Change to a variable
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.jira.arn
}
}
resource "aws_autoscaling_group" "this" {
vpc_zone_identifier = var.subnet_ids
health_check_grace_period = 300
health_check_type = "ELB"
force_delete = true
desired_capacity = 2
max_size = 2
min_size = 2
target_group_arns = [aws_lb_target_group.jira.arn]
timeouts {
delete = "15m"
}
launch_template {
id = aws_launch_template.this.id
# version = "$Latest"
version = aws_launch_template.this.latest_version
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
}
I was expecting my health checks to pass and my instances to stay running, but they keep failing and getting re-deployed
Also here are the security groups for my load balancer and my auto-scaling group
security_groups.tf
resource "aws_security_group" "jira_clb_sg" {
description = "Allow-Veracode-approved-IPs from external to elb"
vpc_id = var.vpc_id
tags = {
Name = "public-elb-sg-for-jira"
Project = "Jira Module"
ManagedBy = "terraform"
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = var.veracode_ips
}
egress {
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "jira_sg" {
description = "Allow-Traffic-From-CLB"
vpc_id = var.vpc_id
tags = {
Name = "allow-jira-public-clb-sg"
Project = "Jira Module"
ManagedBy = "terraform"
}
ingress {
from_port = 0
to_port = 0
protocol = -1
security_groups = [aws_security_group.jira_clb_sg.id]
}
egress {
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
}
My load balancer lets in traffic from port 443 and my auto scaling group allows traffic on any port from the load balancer security group

Your health check is on port 80, your security groups only open port 443.
As described in the Official documentation
"You must ensure that your load balancer can communicate with registered targets on both the listener port and the health check port. Whenever you add a listener to your load balancer or update the health check port for a target group used by the load balancer to route requests, you must verify that the security groups associated with the load balancer allow traffic on the new port in both directions"

Related

User_data not working in Launch configuration. Is it a problem with my security group or my Launch configuration?

I am currently learning Terraform and I need help with regard to the below code. I want to create a simple architecture of an autoscaling group of EC2 instances behind an Application load balancer. The setup gets completed but when I try to access the application endpoint, it gets timed out. When I tried to access the EC2 instances, I was unable to (because EC2 instances were in a security group allowing access from the ALB security group only). I changed the instance security group ingress values and ran the user_data script manually following which I reverted the changes to the instance security group to complete my setup.
My question is why is my setup not working via the below code? Is it because the access is being restricted by the load balancer security group or is my launch configuration block incorrect?
data "aws_ami" "amazon-linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-kernel-5.10-hvm-2.0.20220426.0-x86_64-gp2"]
}
}
data "aws_availability_zones" "available" {
state = "available"
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.14.0"
name = "main-vpc"
cidr = "10.0.0.0/16"
azs = data.aws_availability_zones.available.names
public_subnets = ["10.0.4.0/24","10.0.5.0/24","10.0.6.0/24"]
enable_dns_hostnames = true
enable_dns_support = true
}
resource "aws_launch_configuration" "TestLC" {
name_prefix = "Lab-Instance-"
image_id = data.aws_ami.amazon-linux.id
instance_type = "t2.nano"
key_name = "CloudformationKeyPair"
user_data = file("./user_data.sh")
security_groups = [aws_security_group.TestInstanceSG.id]
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "TestASG" {
min_size = 1
max_size = 3
desired_capacity = 2
launch_configuration = aws_launch_configuration.TestLC.name
vpc_zone_identifier = module.vpc.public_subnets
}
resource "aws_lb_listener" "TestListener"{
load_balancer_arn = aws_lb.TestLB.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.TestTG.arn
}
}
resource "aws_lb" "TestLB" {
name = "Lab-App-Load-Balancer"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.TestLoadBalanceSG.id]
subnets = module.vpc.public_subnets
}
resource "aws_lb_target_group" "TestTG" {
name = "LabTargetGroup"
port = "80"
protocol = "HTTP"
vpc_id = module.vpc.vpc_id
}
resource "aws_autoscaling_attachment" "TestAutoScalingAttachment" {
autoscaling_group_name = aws_autoscaling_group.TestASG.id
lb_target_group_arn = aws_lb_target_group.TestTG.arn
}
resource "aws_security_group" "TestInstanceSG" {
name = "LAB-Instance-SecurityGroup"
ingress{
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.TestLoadBalanceSG.id]
}
ingress{
from_port = 22
to_port = 22
protocol = "tcp"
security_groups = [aws_security_group.TestLoadBalanceSG.id]
}
egress{
from_port = 0
to_port = 0
protocol = "-1"
security_groups = [aws_security_group.TestLoadBalanceSG.id]
}
vpc_id = module.vpc.vpc_id
}
resource "aws_security_group" "TestLoadBalanceSG" {
name = "LAB-LoadBalancer-SecurityGroup"
ingress{
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress{
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress{
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
vpc_id = module.vpc.vpc_id
}

Can't connect through tableau on EC2 using https

I am running tableau server 2021-1-2 on EC2 instance.
I can connect using the default public ip on port 80, also on port 8050 for the Tableau TSM UI. And the same using the hostname I defined. The only issue I have is despite following several guidelines I can't connect using https.
I setup the ports on the security-group, the load-balancer, the certificate, i waited for hours as I saw that the ssl certificate could take more than half of an hour and nothing.
I can connect using:
http://my_domain.domain
But not:
https://my_domain.domain
I receive the following error in the browser: Can't connect to the server https://my_domain.domain.
I run curl -i https://my_domain.domain
It returns:
curl: (7) Failed to connect to my_domain.domainport 443: Connection refused
The security group of my instance has the following ports (u can see it in tf too):
Here you have my tf setup.
I did the EC2 setup with:
resource "aws_instance" "tableau" {
ami = var.ami
instance_type = var.instance_type
associate_public_ip_address = true
key_name = var.key_name
subnet_id = compact(split(",", var.public_subnets))[0]
vpc_security_group_ids = [aws_security_group.tableau-sg.id]
root_block_device{
volume_size = var.volume_size
}
tags = {
Name = var.namespace
}
}
I created the load balancer setup using:
resource "aws_lb" "tableau-lb" {
name = "${var.namespace}-alb"
load_balancer_type = "application"
internal = false
subnets = compact(split(",", var.public_subnets))
security_groups = [aws_security_group.tableau-sg.id]
ip_address_type = "ipv4"
enable_cross_zone_load_balancing = true
lifecycle {
create_before_destroy = true
}
idle_timeout = 300
}
resource "aws_alb_listener" "https" {
depends_on = [aws_alb_target_group.target-group]
load_balancer_arn = aws_lb.tableau-lb.arn
protocol = "HTTPS"
port = "443"
ssl_policy = "my_ssl_policy"
certificate_arn = "arn:xxxx"
default_action {
target_group_arn = aws_alb_target_group.target-group.arn
type = "forward"
}
lifecycle {
ignore_changes = [
default_action.0.target_group_arn,
]
}
}
resource "aws_alb_target_group" "target-group" {
name = "${var.namespace}-group"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "instance"
health_check {
healthy_threshold = var.health_check_healthy_threshold
unhealthy_threshold = var.health_check_unhealthy_threshold
timeout = var.health_check_timeout
interval = var.health_check_interval
path = var.path
}
tags = {
Name = var.namespace
}
lifecycle {
create_before_destroy = false
}
depends_on = [aws_lb.tableau-lb]
}
resource "aws_lb_target_group_attachment" "tableau-attachment" {
target_group_arn = aws_alb_target_group.target-group.arn
target_id = aws_instance.tableau.id
port = 80
}
The security group:
resource "aws_security_group" "tableau-sg" {
name_prefix = "${var.namespace}-sg"
tags = {
Name = var.namespace
}
vpc_id = var.vpc_id
# HTTP from the load balancer
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP from the load balancer
ingress {
from_port = 8850
to_port = 8850
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP from the load balancer
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# 443 secure access from anywhere
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
}
Also setup a hostname domain using:
resource "aws_route53_record" "tableau-record-dns" {
zone_id = var.route_53_zone_id
name = "example.hostname"
type = "A"
ttl = "300"
records = [aws_instance.tableau.public_ip]
}
resource "aws_route53_record" "tableau-record-dns-https" {
zone_id = var.route_53_zone_id
name = "asdf.example.hostname"
type = "CNAME"
ttl = "300"
records = ["asdf.acm-validations.aws."]
}
Finally solved the issue, it was related to the record A. I was assignin an ip there an its impossible to redirect to an specific ip with the loadbalancer there. I redirect traffic to an ELB and worked fine

Configure an internal Network Load Balancer in terraform

I have one external network load balancer (listening on port 80) which forwards traffic to ServiceA instance (on port 9000). I'd like to configure an internal network load balancer that will get requests from ServiceA instances and forward them to ServiceB instance. However, I have a problem with configuring an internal NLB in terrafrom. Here's what I have at the moment:
resource "aws_security_group" "allow-all-traffic-for-internal-nlb" {
name = "int-nlb"
description = "Allow inbound and outbound traffic for internal NLB"
vpc_id = "${aws_vpc.default.id}"
ingress {
from_port = 81
protocol = "tcp"
to_port = 81
cidr_blocks = ["10.61.110.0/24"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_lb" "serviceB_lb" {
name = "serviceB-internal-lb"
internal = true
load_balancer_type = "network"
subnets = ["${aws_subnet.sub.id}"]
}
resource "aws_lb_listener" "serviceB-internal-lb-listener" {
load_balancer_arn = "${aws_lb.serviceB_lb.arn}"
port = 81
protocol = "TCP"
default_action {
target_group_arn = "${aws_lb_target_group.serviceB-internal-lb-tg.arn}"
type = "forward"
}
}
#create a target group for the load balancer and set up a health check
resource "aws_lb_target_group" "serviceB-internal-lb-tg" {
name = "serviceB-int-lb-tg"
port = 81
protocol = "TCP"
vpc_id = "${aws_vpc.default.id}"
target_type = "instance"
health_check {
protocol = "HTTP"
port = "8181"
path = "/"
}
}
#attach a load balancer to the target group
resource "aws_lb_target_group_attachment" "attach-serviceB-tg-to-internal-nlb" {
target_group_arn = "${aws_lb_target_group.serviceB-internal-lb-tg.arn}"
port = 8181
target_id = "${aws_instance.serviceB-1a.id}"
}
# Create Security Groups
resource "aws_security_group_rule" "serviceB_from_serviceB-lb" {
type = "ingress"
from_port = 81
to_port = 81
protocol = "tcp"
source_security_group_id = "${aws_security_group.allow-all-traffic-for-internal-nlb.id}"
security_group_id = "${aws_security_group.serviceB-sg.id}"
}
resource "aws_security_group_rule" "serviceB_nlb_to_serviceB" {
type = "egress"
from_port = 81
to_port = 81
protocol = "tcp"
source_security_group_id = "${aws_security_group.serviceB-sg.id}"
security_group_id = "${aws_security_group.allow-all-traffic-for-internal-nlb.id}"
}
####
resource "aws_security_group" "serviceB-sg" {
name = "${var.environment}-serviceB-sg"
description = "${var.environment} serviceB security group"
vpc_id = "${aws_vpc.default.id}"
ingress {
from_port = 8181
to_port = 8181
protocol = "tcp"
cidr_blocks = ["10.61.110.0/24"]
}
}
The internal load balancer is listening on port 81, and the ServiceB instance is running on port 8181.
Both external and internal NLBs and two services are located in one subnet.
When I check the health status for the target group of the internal load balancer, I get a health check failure.
What can cause this to happen?

Terraform Load Balancer Health Check fails for instance in asg

I have launched a private instance within asg
In public I have created a application loadbalancer to connect to the private instance
Issue:Target Group is showing as health check failed for asg instance
Question: How to fix this and make health check to pass
Kindly support me to solve this issue.
Because of this Timeout occurs when accessed using browser
**alb.tf**
resource "aws_lb" "ops_manager_app_lb" {
name = "ops-manager-app-lb"
internal = false
security_groups = [ aws_security_group.ops_lb_sg.id ]
subnets = [ var.PUB_SUBNET_NAT, var.PUB_SUBNET_2 ]
}
resource "aws_lb_target_group" "opsmanager_target_group_8080" {
depends_on = [ aws_lb.ops_manager_app_lb ]
name = "opsmanager-target-group-8080"
port = 8080
protocol = "HTTP"
vpc_id = var.AWS_VPC
health_check {
path = "/"
port = 8080
protocol = "HTTP"
healthy_threshold = 3
unhealthy_threshold = 3
matcher = "200-499"
}
}
resource "aws_lb_listener" "ops_alb_listener_8080" {
load_balancer_arn = aws_lb.ops_manager_app_lb.arn
port = "8080"
protocol = "HTTP"
#certificate_arn = "${var.elk_cert_arn}"
default_action {
target_group_arn = aws_lb_target_group.opsmanager_target_group_8080.arn
type = "forward"
}
}
**sg.tf**
resource "aws_security_group" "ops_lb_sg" {
name = "opsmanager_app_lb"
description = "Security Group for OpsManager ALB"
vpc_id = var.AWS_VPC
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = [ var.VPC_CIDR ]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
### OpsManager Application Server Security Group ###
resource "aws_security_group" "application_opsmanager_sg" {
name = "application_opsmanager_sg"
description = "Security Group for OpsManager Application Instance"
vpc_id = var.AWS_VPC
ingress {
description = "TCP port for HTTP service"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [ aws_security_group.ops_lb_sg.id ]
#cidr_blocks = [var.VPC_CIDR]
}
}
**main.tf**
resource "aws_launch_configuration" "lc_opsmanager" {
name = "ops_manager_launch"
image_id = var.AMIS
instance_type = var.INSTANCE_TYPE["OPS_APP"]
iam_instance_profile = data.aws_iam_instance_profile.application_instance_profile.name
key_name = var.KEY_NAME
security_groups = [data.aws_security_group.application_sg.id, aws_security_group.ops_lb_sg.id ]
}
resource "aws_autoscaling_group" "asg_opsmanager" {
name = "asg-ops-manager"
max_size = 2
min_size = 1
desired_capacity = 1
#availability_zones = [ data.aws_availability_zone.az_primary.name ]
vpc_zone_identifier = [var.PRIV_SUBNET_OPS]
health_check_type = "EC2"
health_check_grace_period = 300
launch_configuration = aws_launch_configuration.lc_opsmanager.id
target_group_arns = [ aws_lb_target_group.opsmanager_target_group_8080.arn ]
tag {
key = "Name"
value = "ops_manager_application"
propagate_at_launch = true
}
}
There can be many issues with your architecture,
but one that is definitively responsible for
blocking access to your ALB is incorrect
security group.
Namely, the ALB's uses ops_lb_sg which does not allow
internet traffic. Instead it allows connections from only
var.VPC_CIDR. To allow internet connection's it should be:
cidr_blocks = [ "0.0.0.0/0" ]
or CIDR range of your home/work network.

Security Group and Subnet belong to different networks

I am working on deploying IaC to AWS. All is well except for a perplexing issuing I am having with my deployment of EC2 instances through an ASG. When I run Terraform apply I get the following error message:
"Security Group sg-xxxxx and Subnet subnet-xxxxx belong to different networks. Launching EC2 instance failed."
Terraform is using trying to use a default subnet instead of the subnet I defined to be used with the vpc I created. I am not using an ELB at the moment only using an LC and ASG. Below are the code snippet's. Any insight would be helpful!
/******************************************************************
Subnet Definitions
*******************************************************************/
//Define the public subnet for availability zone A.
resource "aws_subnet" "Subnet_A_Public" {
vpc_id = "${aws_vpc.terraform-vpc.id}"
cidr_block = "${var.public_subnet_a}"
availability_zone = "${var.availability_zone_a}"
tags {
Name = "Subnet A - Public"
}
}
//Define the public subnet for availability zone B.
resource "aws_subnet" "Subnet_B_Public" {
vpc_id = "${aws_vpc.terraform-vpc.id}"
cidr_block = "${var.public_subnet_b}"
availability_zone = "${var.availability_zone_b}"
tags {
Name = "Subnet B - Public"
}
}
/*********************************************************************
Security Group (SG) Definitions
**********************************************************************/
//Define the public security group.
resource "aws_security_group" "tf-public-sg" {
name = "TF-Public-SG"
description = "Allow incoming HTTP/HTTPS connections and SSH access from the Internet."
vpc_id = "${aws_vpc.terraform-vpc.id}"
//Accept tcp port 80 (HTTP) from the Internet.
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
//Accept tcp port 443 (HTTPS) from the Internet.
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
//Accept tcp port 22 (SSH) from the Internet.
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
//Accept all ICMP inbound from the Internet.
ingress {
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags {
Name = "Terraform Public SG"
}
}
/**************************************************************************
PUBLIC ASG & LC
***************************************************************************/
resource "aws_launch_configuration" "terraform-public-lc" {
image_id = "${var.ami}"
instance_type = "${var.instance_type}"
security_groups = ["${aws_security_group.tf-public-sg.id}"]
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group" "tf-public-asg" {
launch_configuration = "${aws_launch_configuration.terraform-public-lc.id}"
availability_zones = ["${var.availability_zone_a}", "${var.availability_zone_b}"]
name = "tf-public-asg"
min_size = "${var.asg_min_pubic}"
max_size = "${var.asg_max_public}"
desired_capacity = "${var.asg_desired_capacity_public}"
tags {
key = "Name"
value = "tf-public-asg"
//value = "${var.public_instance_name}-${count.index}"
propagate_at_launch = true
}
}
/************************************************************************
END PUBLIC ASG & LC
*************************************************************************/