I'm confused with configuring LoadBalancer(NLB) in AWS. When configuring the LB as below (it's the Terraform file), I never specified HTTPS protocol. However, after the LB gets spinned up, I can only reach my targets by https://LB_ARN:80 and nothing is shown when I hit http://LB_ARN:80. I am quite confused of the reason, and also, more confusing part is that using https://LB_ARN:80 as DNS, my Browser(Chrome) tells me the site is not secure (though it is HTTPS). Any help please ?
resource "aws_lb" "boundary" {
name = "boundary-nlb"
load_balancer_type = "network"
internal = false
subnets = data.terraform_remote_state.network.outputs.tokyo_vpc_main.public_subnet_ids
tags = merge(local.common_tags, {
Name = "boundary-${terraform.workspace}-controller-nlb"
})
}
resource "aws_lb_target_group" "boundary" {
name = "boundary-nlb"
port = 9200
protocol = "TCP"
vpc_id = data.terraform_remote_state.network.outputs.tokyo_vpc_main.vpc_id
stickiness {
enabled = false
type = "source_ip"
}
tags = merge(local.common_tags, {
Name = "boundary-${terraform.workspace}-controller-nlb-tg"
})
}
resource "aws_lb_target_group_attachment" "boundary" {
count = var.num_controllers
target_group_arn = aws_lb_target_group.boundary.arn
target_id = aws_instance.controller[count.index].id
port = 9200
}
resource "aws_lb_listener" "boundary" {
load_balancer_arn = aws_lb.boundary.arn
port = "80"
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.boundary.arn
}
}
resource "aws_security_group" "boundary_lb" {
vpc_id = data.terraform_remote_state.network.outputs.tokyo_vpc_main.vpc_id
tags = merge(local.common_tags, {
Name = "boundary-${terraform.workspace}-controller-nlb-sg"
})
}
resource "aws_security_group_rule" "allow_9200" {
type = "ingress"
from_port = 9200
to_port = 9200
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
security_group_id = aws_security_group.boundary_lb.id
}
This appears to me as misconfiguration of your backend servers. Specifically, they seem to server HTTPS connections on port 80.
Since you are using NLB with TCP protocol, any HTTPS connection is forwarded to your backend servers. Meaning, there is no SSL termination on your NLB. So even though you haven't specified HTTPS in your NLB settings, HTTPS connections are forwarded on top of TCP to your backend instances. The backend instance handle the HTTPS with maybe self-signed SSL certificate, not NLB, on the wrong port. This would explain warnings from browser.
I would recommend checking configuration of your backend services and making sure that are serving HTTP on port 80, not HTTPS as it seems to be the case now.
Related
background:
I used terraform to build an AWS autoscaling group with a few instances spread across availability zones and linked by a load balancer. Everything is created properly, but the load balancer has no valid targets because they're nothing listening on port 80.
Fine, I thought. I'll install NGINX and throw up a basic config.
expected behavior
instances should be able to reach yum repos
actual behavior
I'm unable to ping anything or run any of the package manager commands, getting the following error
Could not retrieve mirrorlist https://amazonlinux-2-repos-us-east-2.s3.dualstack.us-east-2.amazonaws.com/2/core/latest/x86_64/mirror.list error was
12: Timeout on https://amazonlinux-2-repos-us-east-2.s3.dualstack.us-east-2.amazonaws.com/2/core/latest/x86_64/mirror.list: (28, 'Failed to connect to amazonlinux-2-repos-us-east-2.s3.dualstack.us-east-2.amazonaws.com port 443 after 2700 ms: Connection timed out')
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=<repoid> ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable <repoid>
or
subscription-manager repos --disable=<repoid>
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
troubleshooting steps taken
I'm new to Terraform, and I'm still having issues doing the automated provisioning of user_data, so I SSH'd into the instance. The instance is set up in a public subnet with an auto-provisioned public IP. Below is the code for the security groups.
resource "aws_security_group" "elb_webtrafic_sg" {
name = "elb-webtraffic-sg"
description = "Allow inbound web trafic to load balancer"
vpc_id = aws_vpc.main_vpc.id
ingress {
description = "HTTPS trafic from vpc"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP trafic from vpc"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "allow SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "all traffic out"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "elb-webtraffic-sg"
}
}
resource "aws_security_group" "instance_sg" {
name = "instance-sg"
description = "Allow traffic from load balancer to instances"
vpc_id = aws_vpc.main_vpc.id
ingress {
description = "web traffic from load balancer"
security_groups = [ aws_security_group.elb_webtrafic_sg.id ]
from_port = 80
to_port = 80
protocol = "tcp"
}
ingress {
description = "web traffic from load balancer"
security_groups = [ aws_security_group.elb_webtrafic_sg.id ]
from_port = 443
to_port = 443
protocol = "tcp"
}
ingress {
description = "ssh traffic from anywhere"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "all traffic to load balancer"
security_groups = [ aws_security_group.elb_webtrafic_sg.id ]
from_port = 0
to_port = 0
protocol = "-1"
}
tags = {
Name = "instance-sg"
}
}
#this is a workaround for the cyclical security group id call
#I would like to figure out a way for this to destroy this first
#it currently takes longer to destroy than to set up
#terraform hangs because of the dependancy each SG has on each other,
#but will eventually struggle down to this rule and delete it, clearing the deadlock
resource "aws_security_group_rule" "elb_egress_to_webservers" {
security_group_id = aws_security_group.elb_webtrafic_sg.id
type = "egress"
source_security_group_id = aws_security_group.instance_sg.id
from_port = 80
to_port = 80
protocol = "tcp"
}
resource "aws_security_group_rule" "elb_tls_egress_to_webservers" {
security_group_id = aws_security_group.elb_webtrafic_sg.id
type = "egress"
source_security_group_id = aws_security_group.instance_sg.id
from_port = 443
to_port = 443
protocol = "tcp"
}
Since I was able to SSH into the machine, I tried to set up the web instance security group to allow direct connection from the internet to the instance. Same errors: cannot ping outside web addresses, same error on YUM commands.
I can ping the default gateway in each subnet. 10.0.0.1, 10.0.1.1, 10.0.2.1.
Here is the routing configuration I currently have setup.
resource "aws_vpc" "main_vpc" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "production-vpc"
}
}
resource "aws_key_pair" "aws_key" {
key_name = "Tanchwa_pc_aws"
public_key = file(var.public_key_path)
}
#internet gateway
resource "aws_internet_gateway" "gw" {
vpc_id = aws_vpc.main_vpc.id
tags = {
Name = "internet-gw"
}
}
resource "aws_route_table" "route_table" {
vpc_id = aws_vpc.main_vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gw.id
}
tags = {
Name = "production-route-table"
}
}
resource "aws_subnet" "public_us_east_2a" {
vpc_id = aws_vpc.main_vpc.id
cidr_block = "10.0.0.0/24"
availability_zone = "us-east-2a"
tags = {
Name = "Public-Subnet us-east-2a"
}
}
resource "aws_subnet" "public_us_east_2b" {
vpc_id = aws_vpc.main_vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-2b"
tags = {
Name = "Public-Subnet us-east-2b"
}
}
resource "aws_subnet" "public_us_east_2c" {
vpc_id = aws_vpc.main_vpc.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-2c"
tags = {
Name = "Public-Subnet us-east-2c"
}
}
resource "aws_route_table_association" "a" {
subnet_id = aws_subnet.public_us_east_2a.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "b" {
subnet_id = aws_subnet.public_us_east_2b.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "c" {
subnet_id = aws_subnet.public_us_east_2c.id
route_table_id = aws_route_table.route_table.id
}
Using terraform I'm provisioning infra in AWS for my K3S cluster. I have provisioned an NLB with two listeners on port 80 and 443, with appropriate self-signed certs. This works. I can access HTTP services in my cluster via the nlb.
resource "tls_private_key" "agents" {
algorithm = "RSA"
}
resource "tls_self_signed_cert" "agents" {
key_algorithm = "RSA"
private_key_pem = tls_private_key.agents.private_key_pem
validity_period_hours = 24
subject {
common_name = "my hostname"
organization = "My org"
}
allowed_uses = [
"key_encipherment",
"digital_signature",
"server_auth"
]
}
resource "aws_acm_certificate" "agents" {
private_key = tls_private_key.agents.private_key_pem
certificate_body = tls_self_signed_cert.agents.cert_pem
}
resource "aws_lb" "agents" {
name = "basic-load-balancer"
load_balancer_type = "network"
subnet_mapping {
subnet_id = aws_subnet.agents.id
allocation_id = aws_eip.agents.id
}
}
resource "aws_lb_listener" "agents_80" {
load_balancer_arn = aws_lb.agents.arn
protocol = "TCP"
port = 80
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.agents_80.arn
}
}
resource "aws_lb_listener" "agents_443" {
load_balancer_arn = aws_lb.agents.arn
protocol = "TLS"
port = 443
certificate_arn = aws_acm_certificate.agents.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.agents_443.arn
}
}
resource "aws_lb_target_group" "agents_80" {
port = 30000
protocol = "TCP"
vpc_id = var.vpc.id
depends_on = [
aws_lb.agents
]
}
resource "aws_lb_target_group" "agents_443" {
port = 30001
protocol = "TCP"
vpc_id = var.vpc.id
depends_on = [
aws_lb.agents
]
}
resource "aws_autoscaling_attachment" "agents_80" {
autoscaling_group_name = aws_autoscaling_group.agents.name
alb_target_group_arn = aws_lb_target_group.agents_80.arn
}
resource "aws_autoscaling_attachment" "agents_443" {
autoscaling_group_name = aws_autoscaling_group.agents.name
alb_target_group_arn = aws_lb_target_group.agents_443.arn
}
That's a cutdown version of my code.
I have configured my ingress controller to listen for HTTP and HTTPS on NodePorts 30000 and 30001 respectively. This works too.
The thing that doesn't work is that the NLB is terminating TLS, but I need it to passthrough. I'm doing this so that I can access Kubernetes Dashboard (among other apps), but the dashboard requires https to sign-in, something I can't provide if tls is terminated at the nlb.
I need help configuring the nlb for passthrough. I have searched and searched and can't find any examples. If anyone knows how to configure this it would be good to get some tf code, or even just an idea of the appropriate way of achieving it in AWS so that I can implement it myself in tf.
Do you need TLS passthrough, or just TLS communication between the NLB and the server? Or do you just need to configure your server to be aware that the initial connection was TLS?
For TLS passthrough you would install an SSL certificate on the server, and delete the certificate from the load balancer. You would change the protocol of the port 443 listener on the load balancer from "TLS" to "TCP". This is not a very typical setup on AWS, and you can't use the free AWS ACM SSL certificates in this configuration, you would have to use something like Let's Encrypt on the server.
For TLS communication between the NLB and the server, you would install a certificate on the server, a self-signed cert is fine for this, and then just change the target group settings on the load balancer to point to the secure ports on the server.
If you just want to make the server aware that the initial connection protocol was TLS, you would configure the server to use the x-forwarded-proto header passed by the load balancer to determine if the connection is secure.
I have created a Custom VPC using the Terraform code below.
I have applied the Terraform module below, which:
Adds Public and Private Subnets
Configures IGW
Adds NAT Gateway
Adds SGs for both Web and DB Servers
After applying this, I am not able to ping/ssh from public ec2 instance to the private ec2 instance.
Not sure what is missing.
# Custom VPC
resource "aws_vpc" "MyVPC" {
cidr_block = "10.0.0.0/16"
instance_tenancy = "default" # For Prod use "dedicated"
tags = {
Name = "MyVPC"
}
}
# Creates "Main Route Table", "NACL" & "default Security Group"
# Create Public Subnet, Associate with our VPC, Auto assign Public IP
resource "aws_subnet" "PublicSubNet" {
vpc_id = aws_vpc.MyVPC.id # Our VPC
availability_zone = "eu-west-2a" # AZ within London, 1 Subnet = 1 AZ
cidr_block = "10.0.1.0/24" # Check using this later > "${cidrsubnet(data.aws_vpc.MyVPC.cidr_block, 4, 1)}"
map_public_ip_on_launch = "true" # Auto assign Public IP for Public Subnet
tags = {
Name = "PublicSubNet"
}
}
# Create Private Subnet, Associate with our VPC
resource "aws_subnet" "PrivateSubNet" {
vpc_id = aws_vpc.MyVPC.id # Our VPC
availability_zone = "eu-west-2b" # AZ within London region, 1 Subnet = 1 AZ
cidr_block = "10.0.2.0/24" # Check using this later > "${cidrsubnet(data.aws_vpc.MyVPC.cidr_block, 4, 1)}"
tags = {
Name = "PrivateSubNet"
}
}
# Only 1 IGW per VPC
resource "aws_internet_gateway" "MyIGW" {
vpc_id = aws_vpc.MyVPC.id
tags = {
Name = "MyIGW"
}
}
# New Public route table, so we can keep "default main" route table as Private. Route out to MyIGW
resource "aws_route_table" "MyPublicRouteTable" {
vpc_id = aws_vpc.MyVPC.id # Our VPC
route { # Route out IPV4
cidr_block = "0.0.0.0/0" # IPV4 Route Out for all
# ipv6_cidr_block = "::/0" The parameter destinationCidrBlock cannot be used with the parameter destinationIpv6CidrBlock # IPV6 Route Out for all
gateway_id = aws_internet_gateway.MyIGW.id # Target : Internet Gateway created earlier
}
route { # Route out IPV6
ipv6_cidr_block = "::/0" # IPV6 Route Out for all
gateway_id = aws_internet_gateway.MyIGW.id # Target : Internet Gateway created earlier
}
tags = {
Name = "MyPublicRouteTable"
}
}
# Associate "PublicSubNet" with the public route table created above, removes it from default main route table
resource "aws_route_table_association" "PublicSubNetnPublicRouteTable" {
subnet_id = aws_subnet.PublicSubNet.id
route_table_id = aws_route_table.MyPublicRouteTable.id
}
# Create new security group "WebDMZ" for WebServer
resource "aws_security_group" "WebDMZ" {
name = "WebDMZ"
description = "Allows SSH & HTTP requests"
vpc_id = aws_vpc.MyVPC.id # Our VPC : SGs cannot span VPC
ingress {
description = "Allows SSH requests for VPC: IPV4"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # SSH restricted to my laptop public IP <My PUBLIC IP>/32
}
ingress {
description = "Allows HTTP requests for VPC: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # You can use Load Balancer
}
ingress {
description = "Allows HTTP requests for VPC: IPV6"
from_port = 80
to_port = 80
protocol = "tcp"
ipv6_cidr_blocks = ["::/0"]
}
egress {
description = "Allows SSH requests for VPC: IPV4"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # SSH restricted to my laptop public IP <My PUBLIC IP>/32
}
egress {
description = "Allows HTTP requests for VPC: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allows HTTP requests for VPC: IPV6"
from_port = 80
to_port = 80
protocol = "tcp"
ipv6_cidr_blocks = ["::/0"]
}
}
# Create new EC2 instance (WebServer01) in Public Subnet
# Get ami id from Console
resource "aws_instance" "WebServer01" {
ami = "ami-01a6e31ac994bbc09"
instance_type = "t2.micro"
subnet_id = aws_subnet.PublicSubNet.id
key_name = "MyEC2KeyPair" # To connect using key pair
security_groups = [aws_security_group.WebDMZ.id] # Assign WebDMZ security group created above
# vpc_security_group_ids = [aws_security_group.WebDMZ.id]
tags = {
Name = "WebServer01"
}
}
# Create new security group "MyDBSG" for WebServer
resource "aws_security_group" "MyDBSG" {
name = "MyDBSG"
description = "Allows Public WebServer to Communicate with Private DB Server"
vpc_id = aws_vpc.MyVPC.id # Our VPC : SGs cannot span VPC
ingress {
description = "Allows ICMP requests: IPV4" # For ping,communication, error reporting etc
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["10.0.1.0/24"] # Public Subnet CIDR block, can be "WebDMZ" security group id too as below
security_groups = [aws_security_group.WebDMZ.id] # Tried this as above was not working, but still doesn't work
}
ingress {
description = "Allows SSH requests: IPV4" # You can SSH from WebServer01 to DBServer, using DBServer private ip address and copying private key to WebServer
from_port = 22 # ssh ec2-user#Private Ip Address -i MyPvKey.pem Private Key pasted in MyPvKey.pem
to_port = 22 # Not a good practise to use store private key on WebServer, instead use Bastion Host (Hardened Image, Secure) to connect to Private DB
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
ingress {
description = "Allows HTTP requests: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
ingress {
description = "Allows HTTPS requests : IPV4"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
ingress {
description = "Allows MySQL/Aurora requests"
from_port = 3306
to_port = 3306
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
egress {
description = "Allows ICMP requests: IPV4" # For ping,communication, error reporting etc
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["10.0.1.0/24"] # Public Subnet CIDR block, can be "WebDMZ" security group id too
}
egress {
description = "Allows SSH requests: IPV4" # You can SSH from WebServer01 to DBServer, using DBServer private ip address and copying private key to WebServer
from_port = 22 # ssh ec2-user#Private Ip Address -i MyPvtKey.pem Private Key pasted in MyPvKey.pem chmod 400 MyPvtKey.pem
to_port = 22 # Not a good practise to use store private key on WebServer, instead use Bastion Host (Hardened Image, Secure) to connect to Private DB
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
egress {
description = "Allows HTTP requests: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
egress {
description = "Allows HTTPS requests : IPV4"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
egress {
description = "Allows MySQL/Aurora requests"
from_port = 3306
to_port = 3306
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
}
# Create new EC2 instance (DBServer) in Private Subnet, Associate "MyDBSG" Security Group
resource "aws_instance" "DBServer" {
ami = "ami-01a6e31ac994bbc09"
instance_type = "t2.micro"
subnet_id = aws_subnet.PrivateSubNet.id
key_name = "MyEC2KeyPair" # To connect using key pair
security_groups = [aws_security_group.MyDBSG.id] # THIS WAS GIVING ERROR WHEN ASSOCIATING
# vpc_security_group_ids = [aws_security_group.MyDBSG.id]
tags = {
Name = "DBServer"
}
}
# Elastic IP required for NAT Gateway
resource "aws_eip" "nateip" {
vpc = true
tags = {
Name = "NATEIP"
}
}
# DBServer in private subnet cannot access internet, so add "NAT Gateway" in Public Subnet
# Add Target as NAT Gateway in default main route table. So there is route out to Internet.
# Now you can do yum update on DBServer
resource "aws_nat_gateway" "NATGW" { # Create NAT Gateway in each AZ so in case of failure it can use other
allocation_id = aws_eip.nateip.id # Elastic IP allocation
subnet_id = aws_subnet.PublicSubNet.id # Public Subnet
tags = {
Name = "NATGW"
}
}
# Main Route Table add NATGW as Target
resource "aws_default_route_table" "DefaultRouteTable" {
default_route_table_id = aws_vpc.MyVPC.default_route_table_id
route {
cidr_block = "0.0.0.0/0" # IPV4 Route Out for all
nat_gateway_id = aws_nat_gateway.NATGW.id # Target : NAT Gateway created above
}
tags = {
Name = "DefaultRouteTable"
}
}
Why is ping timing out from WebServer01 to DBServer?
There are no specific NACLs and the default NACLs are wide open, so they should not be relevant here.
For this to work the security group on DBServer needs to allow egress to the security group of DBServer or to a CIDR that includes it.
aws_instance.DBServer uses aws_security_group.MyDBSG.
aws_instance.WebServer01 uses aws_security_group.WebDMZ.
The egress rules on aws_security_group.WebDMZ are as follows:
egress {
description = "Allows SSH requests for VPC: IPV4"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # SSH restricted to my laptop public IP <My PUBLIC IP>/32
}
egress {
description = "Allows HTTP requests for VPC: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allows HTTP requests for VPC: IPV6"
from_port = 80
to_port = 80
protocol = "tcp"
ipv6_cidr_blocks = ["::/0"]
}
Those egress rules mean:
Allow SSH to all IPv4 hosts
Allow HTTP to all IPv4 and IPv6 hosts
ICMP isn't listed, so the ICMP echo request will be dropped before it leaves aws_security_group.WebDMZ. This should be visible as a REJECT in the VPC FlowLog for the ENI of aws_instance.WebServer01.
Adding this egress rule to aws_security_group.WebDMZ should fix that:
egress {
description = "Allows ICMP requests: IPV4" # For ping,communication, error reporting etc
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["10.0.2.0/24"]
}
DBServer may NOT respond to ICMP, so you may still see timeouts after making this change. Referencing the VPC FlowLog will help determine the difference. If you see ACCEPTs in the VPC FlowLog for the ICMP flows, then the issue is that DBServer doesn't respond to ICMP.
Nothing in aws_security_group.WebDMZ prevents SSH, so the problem with that must be elsewhere.
The ingress rules on aws_security_group.MyDBSG are as follows.
ingress {
description = "Allows ICMP requests: IPV4" # For ping,communication, error reporting etc
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["10.0.1.0/24"] # Public Subnet CIDR block, can be "WebDMZ" security group id too as below
security_groups = [aws_security_group.WebDMZ.id] # Tried this as above was not working, but still doesn't work
}
ingress {
description = "Allows SSH requests: IPV4" # You can SSH from WebServer01 to DBServer, using DBServer private ip address and copying private key to WebServer
from_port = 22 # ssh ec2-user#Private Ip Address -i MyPvKey.pem Private Key pasted in MyPvKey.pem
to_port = 22 # Not a good practise to use store private key on WebServer, instead use Bastion Host (Hardened Image, Secure) to connect to Private DB
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24"]
}
Those egress rules mean:
Allow all ICMP from the Public Subnet (10.0.1.0/24)
Allow SSH from the Public Subnet (10.0.1.0/24)
SSH should be working. DBServer likely does not accept SSH connections.
Assuming your DBServer is 10.0.2.123, then if SSH is not running it would look like this from WebServer01 running ssh -v 10.0.2.123:
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Connecting to 10.0.2.123 [10.0.2.123] port 22.
debug1: connect to address 10.0.2.123 port 22: Operation timed out
ssh: connect to host 10.0.2.123 port 22: Operation timed out
Referencing the VPC FlowLog will help determine the difference. If you see ACCEPTs in the VPC FlowLog for the SSH flows, then the issue is that DBServer doesn't accept SSH connections.
Enabling VPC Flow Logs
Since this has changed a few times over the years, I will point you to AWS' own maintained, step by step guide for creating a flow log that publishes to CloudWatch Logs.
I recommend CloudWatch over S3 at the moment since querying using CloudWatch Logs Insights is pretty easy compared to setting up and querying S3 with Athena. You just pick the CloudWatch Log Stream you used for your Flow Log and search for the IPs or ports of interest.
This example CloudWatch Logs Insights query will get the most recent 20 rejects on eni-0123456789abcdef0 (not a real ENI, use the actual ENI ID you are debugging):
fields #timestamp,#message
| sort #timestamp desc
| filter #message like 'eni-0123456789abcdef0'
| filter #message like 'REJECT'
| limit 20
In the VPC Flow Log a missing egress rule shows up as a REJECT on the source ENI.
In the VPC Flow Log a missing ingress rule shows up as a REJECT on the destination ENI.
Security Groups are stateful
Stateless packet filters require you to handle arcane things like the fact that most (not all) OSes use ports 32767-65535 for reply traffic in TCP flows. That's a pain, and it makes NACLs (which a stateless) a huge pain.
A stateful firewall like Security Groups tracks connections (the state in stateful) automatically, so you just need to allow the service port to the destination's SG or IP CIDR block in the source SG's egress rules and from the source SG or IP CIDR block in the destination SG's ingress rules.
Even though Security Groups (SGs) are stateful, they are default deny. This includes the outbound initiated traffic from a source. If a source SG does not allow traffic outbound to a destination, it's not allowed even if the destination has an SG that allows it. This is a common misconception. SG rules are not transitive, they need to be made on both sides.
AWS's Protecting Your Instance with Security Groups video explains this very well and visually.
Aside: Style recommendation
You should use aws_security_group_rule resources instead of inline egress and ingress rules.
NOTE on Security Groups and Security Group Rules: Terraform currently provides both a standalone Security Group Rule resource (a single ingress or egress rule), and a Security Group resource with ingress and egress rules defined in-line. At this time you cannot use a Security Group with in-line rules in conjunction with any Security Group Rule resources. Doing so will cause a conflict of rule settings and will overwrite rules.
Take this old-style aws_security_group with inline rules:
resource "aws_security_group" "WebDMZ" {
name = "WebDMZ"
description = "Allows SSH & HTTP requests"
vpc_id = aws_vpc.MyVPC.id
ingress {
description = "Allows HTTP requests for VPC: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # You can use Load Balancer
}
egress {
description = "Allows SSH requests for VPC: IPV4"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
And replace it with this modern-style aws_security_group with aws_security_group_rule resources for each rule:
resource "aws_security_group" "WebDMZ" {
name = "WebDMZ"
description = "Allows SSH & HTTP requests"
vpc_id = aws_vpc.MyVPC.id
}
resource "aws_security_group_rule" "WebDMZ_HTTP_in" {
security_group_id = aws_security_group.WebDMZ.id
type = "ingress"
description = "Allows HTTP requests for VPC: IPV4"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "WebDMZ_SSH_out" {
security_group_id = aws_security_group.WebDMZ.id
type = "egress"
description = "Allows SSH requests for VPC: IPV4"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
Allowing all outbound rules from "WebDMZ" security group as pasted below resolves the issue.
egress { # Allow allow traffic outbound, THIS WAS THE REASON YOU WAS NOT ABLE TO PING FROM WebServer to DBServer
description = "Allows All Traffic Outbound from Web Server"
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
I am trying to provision an ECS cluster using Terraform along with an ALB. The targets come up as Unhealthy. The error code is 502 in the console Health checks failed with these codes: [502]
I checked through the AWS Troubleshooting guide and nothing helped there.
EDIT: I have no services/tasks running on the EC2 containers. Its a vanilla ECS cluster.
Here is my relevant code for the ALB:
# Target Group declaration
resource "aws_alb_target_group" "lb_target_group_somm" {
name = "${var.alb_name}-default"
port = 80
protocol = "HTTP"
vpc_id = "${var.vpc_id}"
deregistration_delay = "${var.deregistration_delay}"
health_check {
path = "/"
port = 80
protocol = "HTTP"
}
lifecycle {
create_before_destroy = true
}
tags = {
Environment = "${var.environment}"
}
depends_on = ["aws_alb.alb"]
}
# ALB Listener with default forward rule
resource "aws_alb_listener" "https_listener" {
load_balancer_arn = "${aws_alb.alb.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.lb_target_group_somm.arn}"
type = "forward"
}
}
# The ALB has a security group with ingress rules on TCP port 80 and egress rules to anywhere.
# There is a security group rule for the EC2 instances that allows ingress traffic to the ECS cluster from the ALB:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
/*from_port = 32768 */
from_port = 80
to_port = 65535
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
Has anyone hit this error and know how to debug/fix this ?
It looks like you're trying to be register the ECS cluster instances with the ALB target group. This isn't how you're meant to send traffic to an ECS service via an ALB.
Instead you should have your service join the tasks to the target group. This will mean that if you are using host networking then only the instances with the task deployed will be registered. If you are using bridge networking then it will add the ephemeral ports used by your task to your target group (including allowing for there to be multiple targets on a single instance). And if you are using awsvpc networking then it will register the ENIs of every task that the service spins up.
To do this you should use the load_balancer block in the aws_ecs_service resource. An example might look something like this:
resource "aws_ecs_service" "mongo" {
name = "mongodb"
cluster = "${aws_ecs_cluster.foo.id}"
task_definition = "${aws_ecs_task_definition.mongo.arn}"
desired_count = 3
iam_role = "${aws_iam_role.foo.arn}"
load_balancer {
target_group_arn = "${aws_lb_target_group.lb_target_group_somm.arn}"
container_name = "mongo"
container_port = 8080
}
}
If you were using bridge networking this would mean that the tasks are accessible on the ephemeral port range on the instances so your security group rule would need to look like this:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
from_port = 32768 # ephemeral port range for bridge networking tasks
to_port = 60999 # cat /proc/sys/net/ipv4/ip_local_port_range
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
it looks like the http://ecsInstanceIp:80 is not returning HTTP 200 OK. I would check that first. It would be easy to check if the instance is public. It wont be the case most of the times. Otherwise I would create an EC2 instance and make a curl request to confirm that.
You may also check the container logs to see if its logging the health check response.
Hope this helps. good luck.
I am trying to use API Gateway’s VPC links to route traffic to an internal API on HTTPS.
But, VPC links forces me to change my API’s load balancer from "application" to "network".
I understand that the network load balancer is on layer 4 and as such will not know about HTTPS.
I am used to using the layer 7 application load balancer. As such, I am not sure how I should configure or indeed use the network load balancer in terraform.
Below is my attempt at configuring the network load balancer in Terraform.
The health check fails and I'm not sure what I am doing wrong.
resource "aws_ecs_service" “app” {
name = "${var.env}-${var.subenv}-${var.appname}"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.app.arn}"
desired_count = "${var.desired_app_count}"
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
iam_role = "arn:aws:iam::${var.account}:role/ecsServiceRole"
load_balancer {
target_group_arn = "${aws_lb_target_group.app-lb-tg.arn}"
container_name = "${var.env}-${var.subenv}-${var.appname}"
container_port = 9000
}
depends_on = [
"aws_lb.app-lb"
]
}
resource "aws_lb" “app-lb" {
name = "${var.env}-${var.subenv}-${var.appname}"
internal = false
load_balancer_type = "network"
subnets = "${var.subnet_ids}"
idle_timeout = 600
tags {
Owner = ""
Env = "${var.env}"
}
}
resource "aws_lb_listener" “app-lb-listener" {
load_balancer_arn = "${aws_lb.app-lb.arn}"
port = 443
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.app-lb-tg.arn}"
}
}
resource "aws_lb_target_group" “app-lb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 443
stickiness = []
health_check {
path = "/actuator/health"
}
protocol = "TCP"
vpc_id = "${var.vpc_id}"
}
For reference, this is how I previously configured my application load balancer before attempting to switch to a network load balancer:
resource "aws_ecs_service" "app" {
name = "${var.env}-${var.subenv}-${var.appname}"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.app.arn}"
desired_count = "${var.desired_app_count}"
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
iam_role = "arn:aws:iam::${var.account}:role/ecsServiceRole"
load_balancer {
target_group_arn = "${aws_lb_target_group.app-alb-tg.arn}"
container_name = "${var.env}-${var.subenv}-${var.appname}"
container_port = 9000
}
depends_on = [
"aws_alb.app-alb"]
}
resource "aws_alb" "app-alb" {
name = "${var.env}-${var.subenv}-${var.appname}"
subnets = "${var.subnet_ids}"
security_groups = [
"${var.vpc_default_sg}",
"${aws_security_group.app_internal.id}"]
internal = false
idle_timeout = 600
tags {
Owner = ""
Env = "${var.env}"
}
}
resource "aws_lb_listener" "app-alb-listener" {
load_balancer_arn = "${aws_alb.app-alb.arn}"
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2015-05"
certificate_arn = "${var.certificate_arn}"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.app-alb-tg.arn}"
}
}
resource "aws_lb_target_group" "app-alb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 80
health_check {
path = "/actuator/health"
}
protocol = "HTTP"
vpc_id = "${var.vpc_id}"
}
A network load balancer automatically performs passive health checks on non UDP traffic that flows through it so if that's enough then you can just remove the active health check configuration.
If you want to enable active health checks then you can either use TCP health checks (the default) which will just check that the port is open or you can specify the HTTP/HTTPS protocol and specify a path. Ideally the AWS API would error when you try to specify a path for the health check but don't set the protocol to HTTP or HTTPS but apparently that's not the case right now.
With Terraform this would look something like this:
resource "aws_lb_target_group" "app-alb-tg" {
name = "${var.env}-${var.subenv}-${var.appname}"
port = 443
protocol = "TCP"
vpc_id = "${var.vpc_id}"
health_check {
path = "/actuator/health"
protocol = "HTTPS"
}
}
Remember that active health checks which will check that the port is open on the target from the network load balancer's perspective (not just the source traffic). This means that your target will need to allow traffic from the subnets that your NLBs reside in as well as security groups or CIDR ranges etc that your source traffic originates in.