I have a Terraform codebase which deploys a private EKS cluster, a bastion host and other AWS services. I have also added a few security groups to the in Terraform. One of the security groups allows inbound traffic from my Home IP to the bastion host so that i can SSH onto that node. This security group is called bastionSG, and that works fine also.
However, initially I am unable to run kubectl from my bastion host, which is the node I use to carry out my kubernetes development on against the EKS cluster nodes. The reason is because my EKS cluster is a private and only allows communication from nodes in the same VPC and i need to add a security group that allows the communication from my bastion host to the cluster control plane which is where my security group bastionSG comes in.
So my routine now is once Terraform deploys everything, I then find the automatic generated EKS security group and add my bastionSG as an inbound rule to it through the AWS Console (UI) as shown in the image below.
I would like to NOT have to do this through the UI, as i am already using Terraform to deploy my entire infrastructure.
I know i can query an existing security group like this
data "aws_security_group" "selectedSG" {
id = var.security_group_id
}
In this case, lets say selectedSG is the security group creared by EKS once terraform is completed the apply process. I would like to then add an inbound rule of bastionSG to it without it ovewriting the others it's added automatically.
UPDATE: > EKS NODE GROUP
resource "aws_eks_node_group" "flmd_node_group" {
cluster_name = var.cluster_name
node_group_name = var.node_group_name
node_role_arn = var.node_pool_role_arn
subnet_ids = [var.flmd_private_subnet_id]
instance_types = ["t2.small"]
scaling_config {
desired_size = 3
max_size = 3
min_size = 3
}
update_config {
max_unavailable = 1
}
remote_access {
ec2_ssh_key = "MyPemFile"
source_security_group_ids = [
var.allow_tls_id,
var.allow_http_id,
var.allow_ssh_id,
var.bastionSG_id
]
}
tags = {
"Name" = "flmd-eks-node"
}
}
As shown above, the EKS node group has the bastionSG security group in it. which i expect to allow the connection from my bastion host to the EKS control plane.
EKS Cluster
resource "aws_eks_cluster" "flmd_cluster" {
name = var.cluster_name
role_arn = var.role_arn
vpc_config {
subnet_ids =[var.flmd_private_subnet_id, var.flmd_public_subnet_id, var.flmd_public_subnet_2_id]
endpoint_private_access = true
endpoint_public_access = false
security_group_ids = [ var.bastionSG_id]
}
}
bastionSG_id is an output of the security group created below which is passed into the code above as a variable.
BastionSG security group
resource "aws_security_group" "bastionSG" {
name = "Home to bastion"
description = "Allow SSH - Home to Bastion"
vpc_id = var.vpc_id
ingress {
description = "Home to bastion"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [<MY HOME IP address>]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = {
Name = "Home to bastion"
}
}
Let's start with creating first of all a public security group.
################################################################################
# Create the Security Group
################################################################################
resource "aws_security_group" "public" {
vpc_id = local.vpc_id
name = format("${var.name}-${var.public_security_group_suffix}-SG")
description = format("${var.name}-${var.public_security_group_suffix}-SG")
dynamic "ingress" {
for_each = var.public_security_group_ingress
content {
cidr_blocks = lookup(ingress.value, "cidr_blocks", [])
ipv6_cidr_blocks = lookup(ingress.value, "ipv6_cidr_blocks", [])
from_port = lookup(ingress.value, "from_port", 0)
to_port = lookup(ingress.value, "to_port", 0)
protocol = lookup(ingress.value, "protocol", "-1")
}
}
dynamic "egress" {
for_each = var.public_security_group_egress
content {
cidr_blocks = lookup(egress.value, "cidr_blocks", [])
ipv6_cidr_blocks = lookup(egress.value, "ipv6_cidr_blocks", [])
from_port = lookup(egress.value, "from_port", 0)
to_port = lookup(egress.value, "to_port", 0)
protocol = lookup(egress.value, "protocol", "-1")
}
}
tags = merge(
{
"Name" = format(
"${var.name}-${var.public_security_group_suffix}-SG",
)
},
var.tags,
)
}
Now creating a private security group, making inbound from the public security group, and outbound to the elasticache and rds security group.
resource "aws_security_group" "private" {
vpc_id = local.vpc_id
name = format("${var.name}-${var.private_security_group_suffix}-SG")
description = format("${var.name}-${var.private_security_group_suffix}-SG")
ingress {
security_groups = [aws_security_group.public.id]
from_port = 0
to_port = 0
protocol = "-1"
}
dynamic "ingress" {
for_each = var.private_security_group_ingress
content {
cidr_blocks = lookup(ingress.value, "cidr_blocks", [])
ipv6_cidr_blocks = lookup(ingress.value, "ipv6_cidr_blocks", [])
from_port = lookup(ingress.value, "from_port", 0)
to_port = lookup(ingress.value, "to_port", 0)
protocol = lookup(ingress.value, "protocol", "-1")
}
}
dynamic "egress" {
for_each = var.private_security_group_egress
content {
cidr_blocks = lookup(egress.value, "cidr_blocks", [])
ipv6_cidr_blocks = lookup(egress.value, "ipv6_cidr_blocks", [])
from_port = lookup(egress.value, "from_port", 0)
to_port = lookup(egress.value, "to_port", 0)
protocol = lookup(egress.value, "protocol", "-1")
}
}
egress {
security_groups = [aws_security_group.elsaticache_private.id] # it communciates via network interfaces
from_port = 6379 # redis port
to_port = 6379
protocol = "tcp"
}
egress {
security_groups = [aws_security_group.rds_mysql_private.id]
from_port = 3306
to_port = 3306
protocol = "tcp"
}
tags = merge(
{
"Name" = format(
"${var.name}-${var.private_security_group_suffix}-SG"
)
},
var.tags,
)
depends_on = [aws_security_group.elsaticache_private, aws_security_group.rds_mysql_private]
}
Creating just an egress rule in elasticache security group, and adding one more rule for ingress from the private security group as it resolves the dependency. The same goes for the RDS Security group.
resource "aws_security_group" "elsaticache_private" {
vpc_id = local.vpc_id
name = format("${var.name}-${var.private_security_group_suffix}-elasticache-SG")
description = format("${var.name}-${var.private_security_group_suffix}-elasticache-SG")
egress {
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
from_port = 0
to_port = 0
protocol = "-1"
}
tags = merge(
{
"Name" = format(
"${var.name}-${var.public_security_group_suffix}-elasticache-SG",
)
},
var.tags,
)
}
resource "aws_security_group_rule" "elsaticache_private_rule" {
type = "ingress"
from_port = 6379 # redis port
to_port = 6379
protocol = "tcp"
source_security_group_id = aws_security_group.private.id
security_group_id = aws_security_group.elsaticache_private.id
depends_on = [aws_security_group.private]
}
resource "aws_security_group" "rds_mysql_private" {
vpc_id = local.vpc_id
name = format("${var.name}-${var.private_security_group_suffix}-rds-mysql-SG")
description = format("${var.name}-${var.private_security_group_suffix}-rds-mysql-SG")
egress {
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
from_port = 0
to_port = 0
protocol = "-1"
}
tags = merge(
{
"Name" = format(
"${var.name}-${var.public_security_group_suffix}-rds-mysql-SG",
)
},
var.tags,
)
}
resource "aws_security_group_rule" "rds_mysql_private_rule" {
type = "ingress"
from_port = 3306 # mysql / aurora port
to_port = 3306
protocol = "tcp"
source_security_group_id = aws_security_group.private.id
security_group_id = aws_security_group.rds_mysql_private.id
depends_on = [aws_security_group.private]
}
There was a simpler solution.
Query AWS using terraform data attribute, get the id of the security group then use that to create security_group_rule in terraform with the inbound rule that is required.
My instances keep failing their ELB health checks and I can't find any information on why that's happening. I go to the target group in the console and under 'targets' the only information I get is that the health check status is 'unhealthy' and the 'health status details' just say 'health checks failed'. How can I find the real reason my health checks are failing? Here's my Terraform code as well that includes my load balancer, auto scaling group, listener and target group
main.tf
resource "aws_lb" "jira-alb" {
name = "jira-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.jira_clb_sg.id]
subnets = [var.public_subnet_ids[0], var.public_subnet_ids[1]]
enable_deletion_protection = false
access_logs {
bucket = aws_s3_bucket.this.id
enabled = true
}
tags = {
Environment = "production"
}
}
resource "aws_lb_target_group" "jira" {
name = "jira-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 10
unhealthy_threshold = 5
interval = 30
timeout = 5
path = "/index.html"
}
stickiness {
type = "lb_cookie"
cookie_duration = 1 ## CANT BE 0.. RANGES FROM 1-604800
}
}
resource "aws_lb_listener" "jira-listener" {
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
load_balancer_arn = aws_lb.jira-alb.arn
certificate_arn = data.aws_acm_certificate.this.arn ##TODO Change to a variable
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.jira.arn
}
}
resource "aws_autoscaling_group" "this" {
vpc_zone_identifier = var.subnet_ids
health_check_grace_period = 300
health_check_type = "ELB"
force_delete = true
desired_capacity = 2
max_size = 2
min_size = 2
target_group_arns = [aws_lb_target_group.jira.arn]
timeouts {
delete = "15m"
}
launch_template {
id = aws_launch_template.this.id
# version = "$Latest"
version = aws_launch_template.this.latest_version
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
}
I was expecting my health checks to pass and my instances to stay running, but they keep failing and getting re-deployed
Also here are the security groups for my load balancer and my auto-scaling group
security_groups.tf
resource "aws_security_group" "jira_clb_sg" {
description = "Allow-Veracode-approved-IPs from external to elb"
vpc_id = var.vpc_id
tags = {
Name = "public-elb-sg-for-jira"
Project = "Jira Module"
ManagedBy = "terraform"
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = var.veracode_ips
}
egress {
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "jira_sg" {
description = "Allow-Traffic-From-CLB"
vpc_id = var.vpc_id
tags = {
Name = "allow-jira-public-clb-sg"
Project = "Jira Module"
ManagedBy = "terraform"
}
ingress {
from_port = 0
to_port = 0
protocol = -1
security_groups = [aws_security_group.jira_clb_sg.id]
}
egress {
from_port = 0
to_port = 0
protocol = -1
cidr_blocks = ["0.0.0.0/0"]
}
}
My load balancer lets in traffic from port 443 and my auto scaling group allows traffic on any port from the load balancer security group
Your health check is on port 80, your security groups only open port 443.
As described in the Official documentation
"You must ensure that your load balancer can communicate with registered targets on both the listener port and the health check port. Whenever you add a listener to your load balancer or update the health check port for a target group used by the load balancer to route requests, you must verify that the security groups associated with the load balancer allow traffic on the new port in both directions"
I am running tableau server 2021-1-2 on EC2 instance.
I can connect using the default public ip on port 80, also on port 8050 for the Tableau TSM UI. And the same using the hostname I defined. The only issue I have is despite following several guidelines I can't connect using https.
I setup the ports on the security-group, the load-balancer, the certificate, i waited for hours as I saw that the ssl certificate could take more than half of an hour and nothing.
I can connect using:
http://my_domain.domain
But not:
https://my_domain.domain
I receive the following error in the browser: Can't connect to the server https://my_domain.domain.
I run curl -i https://my_domain.domain
It returns:
curl: (7) Failed to connect to my_domain.domainport 443: Connection refused
The security group of my instance has the following ports (u can see it in tf too):
Here you have my tf setup.
I did the EC2 setup with:
resource "aws_instance" "tableau" {
ami = var.ami
instance_type = var.instance_type
associate_public_ip_address = true
key_name = var.key_name
subnet_id = compact(split(",", var.public_subnets))[0]
vpc_security_group_ids = [aws_security_group.tableau-sg.id]
root_block_device{
volume_size = var.volume_size
}
tags = {
Name = var.namespace
}
}
I created the load balancer setup using:
resource "aws_lb" "tableau-lb" {
name = "${var.namespace}-alb"
load_balancer_type = "application"
internal = false
subnets = compact(split(",", var.public_subnets))
security_groups = [aws_security_group.tableau-sg.id]
ip_address_type = "ipv4"
enable_cross_zone_load_balancing = true
lifecycle {
create_before_destroy = true
}
idle_timeout = 300
}
resource "aws_alb_listener" "https" {
depends_on = [aws_alb_target_group.target-group]
load_balancer_arn = aws_lb.tableau-lb.arn
protocol = "HTTPS"
port = "443"
ssl_policy = "my_ssl_policy"
certificate_arn = "arn:xxxx"
default_action {
target_group_arn = aws_alb_target_group.target-group.arn
type = "forward"
}
lifecycle {
ignore_changes = [
default_action.0.target_group_arn,
]
}
}
resource "aws_alb_target_group" "target-group" {
name = "${var.namespace}-group"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "instance"
health_check {
healthy_threshold = var.health_check_healthy_threshold
unhealthy_threshold = var.health_check_unhealthy_threshold
timeout = var.health_check_timeout
interval = var.health_check_interval
path = var.path
}
tags = {
Name = var.namespace
}
lifecycle {
create_before_destroy = false
}
depends_on = [aws_lb.tableau-lb]
}
resource "aws_lb_target_group_attachment" "tableau-attachment" {
target_group_arn = aws_alb_target_group.target-group.arn
target_id = aws_instance.tableau.id
port = 80
}
The security group:
resource "aws_security_group" "tableau-sg" {
name_prefix = "${var.namespace}-sg"
tags = {
Name = var.namespace
}
vpc_id = var.vpc_id
# HTTP from the load balancer
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP from the load balancer
ingress {
from_port = 8850
to_port = 8850
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# HTTP from the load balancer
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# 443 secure access from anywhere
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Outbound internet access
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
}
Also setup a hostname domain using:
resource "aws_route53_record" "tableau-record-dns" {
zone_id = var.route_53_zone_id
name = "example.hostname"
type = "A"
ttl = "300"
records = [aws_instance.tableau.public_ip]
}
resource "aws_route53_record" "tableau-record-dns-https" {
zone_id = var.route_53_zone_id
name = "asdf.example.hostname"
type = "CNAME"
ttl = "300"
records = ["asdf.acm-validations.aws."]
}
Finally solved the issue, it was related to the record A. I was assignin an ip there an its impossible to redirect to an specific ip with the loadbalancer there. I redirect traffic to an ELB and worked fine
I have one external network load balancer (listening on port 80) which forwards traffic to ServiceA instance (on port 9000). I'd like to configure an internal network load balancer that will get requests from ServiceA instances and forward them to ServiceB instance. However, I have a problem with configuring an internal NLB in terrafrom. Here's what I have at the moment:
resource "aws_security_group" "allow-all-traffic-for-internal-nlb" {
name = "int-nlb"
description = "Allow inbound and outbound traffic for internal NLB"
vpc_id = "${aws_vpc.default.id}"
ingress {
from_port = 81
protocol = "tcp"
to_port = 81
cidr_blocks = ["10.61.110.0/24"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_lb" "serviceB_lb" {
name = "serviceB-internal-lb"
internal = true
load_balancer_type = "network"
subnets = ["${aws_subnet.sub.id}"]
}
resource "aws_lb_listener" "serviceB-internal-lb-listener" {
load_balancer_arn = "${aws_lb.serviceB_lb.arn}"
port = 81
protocol = "TCP"
default_action {
target_group_arn = "${aws_lb_target_group.serviceB-internal-lb-tg.arn}"
type = "forward"
}
}
#create a target group for the load balancer and set up a health check
resource "aws_lb_target_group" "serviceB-internal-lb-tg" {
name = "serviceB-int-lb-tg"
port = 81
protocol = "TCP"
vpc_id = "${aws_vpc.default.id}"
target_type = "instance"
health_check {
protocol = "HTTP"
port = "8181"
path = "/"
}
}
#attach a load balancer to the target group
resource "aws_lb_target_group_attachment" "attach-serviceB-tg-to-internal-nlb" {
target_group_arn = "${aws_lb_target_group.serviceB-internal-lb-tg.arn}"
port = 8181
target_id = "${aws_instance.serviceB-1a.id}"
}
# Create Security Groups
resource "aws_security_group_rule" "serviceB_from_serviceB-lb" {
type = "ingress"
from_port = 81
to_port = 81
protocol = "tcp"
source_security_group_id = "${aws_security_group.allow-all-traffic-for-internal-nlb.id}"
security_group_id = "${aws_security_group.serviceB-sg.id}"
}
resource "aws_security_group_rule" "serviceB_nlb_to_serviceB" {
type = "egress"
from_port = 81
to_port = 81
protocol = "tcp"
source_security_group_id = "${aws_security_group.serviceB-sg.id}"
security_group_id = "${aws_security_group.allow-all-traffic-for-internal-nlb.id}"
}
####
resource "aws_security_group" "serviceB-sg" {
name = "${var.environment}-serviceB-sg"
description = "${var.environment} serviceB security group"
vpc_id = "${aws_vpc.default.id}"
ingress {
from_port = 8181
to_port = 8181
protocol = "tcp"
cidr_blocks = ["10.61.110.0/24"]
}
}
The internal load balancer is listening on port 81, and the ServiceB instance is running on port 8181.
Both external and internal NLBs and two services are located in one subnet.
When I check the health status for the target group of the internal load balancer, I get a health check failure.
What can cause this to happen?
I have a terraform plan (below) that creates a couple of nodes in a private VPC on AWS. Everything seems to work well, but I can't ssh or ping between the nodes in the VPC.
What am I missing from the following configuration to allow the 2 nodes in the private network to be able to talk to each other?
provider "aws" {
region = "${var.aws_region}"
access_key = "${var.aws_access_key}"
secret_key = "${var.aws_secret_key}"
}
# Create a VPC to launch our instances into
resource "aws_vpc" "default" {
cidr_block = "10.0.0.0/16"
tags {
Name = "SolrCluster1"
}
}
# Create an internet gateway to give our subnet access to the outside world
resource "aws_internet_gateway" "default" {
vpc_id = "${aws_vpc.default.id}"
tags {
Name = "SolrCluster1"
}
}
# Grant the VPC internet access on its main route table
resource "aws_route" "internet_access" {
route_table_id = "${aws_vpc.default.main_route_table_id}"
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.default.id}"
}
# Create a subnet to launch our instances into
resource "aws_subnet" "private" {
vpc_id = "${aws_vpc.default.id}"
cidr_block = "10.0.1.0/24"
# if true, instances launched into this subnet should be assigned a public IP
map_public_ip_on_launch = true
# availability_zone =
tags {
Name = "SolrCluster1"
}
}
# Security Group to Access the instances over SSH, and 8983
resource "aws_security_group" "main_security_group" {
name = "SolrCluster1"
description = "Allow access to the servers via port 22"
vpc_id = "${aws_vpc.default.id}"
// allow traffic from the SG itself for tcp
ingress {
from_port = 1
to_port = 65535
protocol = "tcp"
self = true
}
// allow traffic from the SG itself for udp
ingress {
from_port = 1
to_port = 65535
protocol = "udp"
self = true
}
// allow SSH traffic from anywhere TODO: Button this up a bit?
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
// allow ICMP
ingress {
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_instance" "solr" {
ami = "ami-408c7f28"
instance_type = "t1.micro"
# The name of our SSH keypair we created above.
# key_name = "${aws_key_pair.auth.id}"
key_name = "${var.key_name}"
vpc_security_group_ids = ["${aws_security_group.main_security_group.id}"]
# Launch the instances into our subnet
subnet_id = "${aws_subnet.private.id}"
# The connection block tells our provisioner how to communicate with the
# resource (instance)
connection {
# The default username for our AMI
user = "ubuntu"
# The connection will use the local SSH agent for authentication.
private_key = "${file(var.private_key_path)}"
}
/* provisioner "remote-exec" { */
/* inline = [ */
/* "sudo apt-get -y update", */
/* "sudo apt-get -y --force-yes install nginx", */
/* "sudo service nginx start" */
/* ] */
/* } */
tags {
Name = "SolrDev${count.index}"
}
count = 2
}
Turned out I left out the egress rules for my subnet:
egress {
from_port = 1
to_port = 65535
protocol = "tcp"
self = true
}
// allow traffic from the SG itself for udp
egress {
from_port = 1
to_port = 65535
protocol = "udp"
self = true
}