Cannot access AWS EC2 public IP created through Terraform - amazon-web-services

I am trying to run one of the first basic examples from the book Terraform Up and Running. My main.tf is almost identical to the one in the link apart from the version:
provider "aws" {
region = "us-east-2"
}
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
vpc_security_group_ids = [aws_security_group.instance.id]
user_data = <<-EOF
#!/bin/bash
echo "Hello, World" > index.html
nohup busybox httpd -f -p 8080 &
EOF
tags = {
Name = "terraform-example"
}
}
resource "aws_security_group" "instance" {
name = var.security_group_name
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
variable "security_group_name" {
description = "The name of the security group"
type = string
default = "terraform-example-instance"
}
output "public_ip" {
value = aws_instance.example.public_ip
description = "The public IP of the Instance"
}
I ran terraform apply and everything seems to be created successfully. However, when I try to run curl http://<EC2_INSTANCE_PUBLIC_IP>:8080, the command hangs.
I created my AWS account right before I ran the example, so it uses the default network configuration.
The routing table has an entry pointing to the Internet Gateway of the VPC:
Destination | Target | Status | Propagated
0.0.0.0/0 | igw-<igwId> | active | No
The Network ACLs has the default settings:
Rule number | Type | Protocol | Port range | Source | Allow/Deny
100 | All traffic| All | All | 0.0.0.0/0 | Allow
* | All traffic| All | All | 0.0.0.0/0 | Deny
My Terraform version is v0.14.10.
Any ideas on how to access the instance's server through the EC2's public IP?

There is nothing wrong the TF program. I verified it using my sandbox account and it works as expected. It takes 1-2 minutes for script to start working, so maybe you are testing it too soon.
So whatever difficulties you have are not due to the script itself. So either what you posted on SO is not your actual code, or you have somehow modified your VPC configurations which make the instance not-accessible.

Related

AWS CodeDeploy & ECS - Routing traffic to N services in an AWS ECS cluster that use CodeDeploy's Blue/Green deployment

I have an AWS ECS cluster that runs N public facing services, each service runs a different dockerized application and is deployed using a Blue/Green deployment managed by AWS CodeDeploy.
HTTPS traffic is routed to my services using an application load balancer and host based routing rules (Each service has a unique subdomain on a common root domain, one.example.com, two.example.com, etc...).
All of my infrastructure is managed through Terraform. Below is the code from a module I created to configure the required target groups and listener rules for CodeDeploy to perform a Blue/Green deployment. Each service has it's own instance of this module.
# Blue/Green networking configuration for ECS service
resource "aws_lb_target_group" "blue" {
name = "${var.env}-${var.service_name}-blue"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
deregistration_delay = 60
health_check {
path = var.health_check_path
protocol = "HTTP"
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_lb_target_group" "green" {
name = "${var.env}-${var.service_name}-green"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
deregistration_delay = 60
health_check {
path = var.health_check_path
protocol = "HTTP"
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_lb_listener_rule" "service_host_rule" {
listener_arn = var.https_listener_arn
action {
type = "forward"
target_group_arn = aws_lb_target_group.blue.arn
}
condition {
host_header {
values = [var.service_domain_name]
}
}
}
As you can see the listener rule I've created for the services subdomain currently only forwards traffic to the blue target group.
# deployment.tf in ECS service module
resource "aws_codedeploy_deployment_group" "this" {
app_name = var.code_deploy.app_name
deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
deployment_group_name = var.ecs_service_name
service_role_arn = var.code_deploy.service_role.arn
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE"]
}
dynamic "blue_green_deployment_config" {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 10
}
}
deployment_style {
deployment_option = "WITH_TRAFFIC_CONTROL"
deployment_type = "BLUE_GREEN"
}
ecs_service {
cluster_name = var.cluster_name
service_name = var.ecs_service_name
}
load_balancer_info {
target_group_pair_info {
target_group {
# The Blue target group created in the above module
name = var.alb_target_groups.blue.name
}
target_group {
# The Green target group created in the above module
name = var.alb_target_groups.green.name
}
prod_traffic_route {
listener_arns = [var.alb_listener_arn]
}
}
}
}
In my CI/CD pipeline this terraform is run, creating the target groups, listener rule and deployment group. Once the terraform has run the deployment is started through the AWS CLI aws deploy create-deployment ....
This works fine for the first deployment of the service, once it's completed all traffic is routed to the green target group and the tasks running in the blue group are terminated.
Then when deploying a new version for the second time, CodeDeploy returns an error message:
The ELB could not be updated due to the following error: Primary taskset target group must be behind listener arn:aws:elasticloadbalancing:xxxxxx:xxxxxx:listener-rule/app/xxxx/xxxx/xxxx/xxxx.
I believe this is because CodeDeploy looks for the blue target group, expecting current traffic to be routing to it. But instead because of the state the last deploy left the service in the next deploy will fail. Is there something wrong in my configuration? or am I missing a manual step that needs to be completed once a deploy finishes?
Edit:
To give more clarity on how my infrastructure is setup:
infrastructure/
|- modules/
| |- ecs-service/
| | |- modules/
| | | |- blue-green-config/
| | | | |- main.tf
| | |- main.tf
| | |- deployment.tf
The ECS service module creates its own instance of a Blue/Green configuration module.
Some details are still missing (e.g. output names), but what I would suggest is instead of using variable names in deployment.tf for target groups to use the ecs-service module outputs. This way, you would tell terraform to create an implicit dependency between the resources created by the child module and the resources created in the root module. For example:
module "ecs_service" {
source = "./modules/blue-green-config"
... rest of the variables go here ...
}
In the deployment.tf there would be the following change (removed all the things that would stay the same):
# deployment.tf in ECS service module
resource "aws_codedeploy_deployment_group" "this" {
.
.
.
load_balancer_info {
target_group_pair_info {
target_group {
name = module.ecs_service.blue_target_group_name # <---- example, since the output names are not in the question
}
target_group {
name = module.ecs_service.green_target_group_name # <---- example, since the output names are not in the question
}
prod_traffic_route {
listener_arns = [var.alb_listener_arn]
}
}
}
}
Besides those two implicit dependencies, I would strongly suggest creating outputs for listener_arns as well as for the ECS service and cluster name and using those instead of variables. One thing to reconsider as well is:
dynamic "blue_green_deployment_config" {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 10
}
}
because it seems that this is a dynamic block without specifying any conditions and content for that block, which kind of renders the dynamic keyword useless.

terraform wants to replace ec2 instances when i simply want to add a rule to a security group

I have an ec2 instance defined in terraform along with some security rules.
These are the security rules:
resource "aws_security_group" "ec2_web" {
name = "${var.project_name}_${var.env}_ec2_web"
description = "ec2 instances that serve to the load balancer"
vpc_id = aws_vpc.main.id
}
resource "aws_security_group_rule" "ec2_web_http" {
type = "egress"
from_port = 80
to_port = 80
protocol = "tcp"
# cidr_blocks = ["0.0.0.0/0"]
security_group_id = aws_security_group.ec2_web.id
source_security_group_id = aws_security_group.elb.id
}
resource "aws_security_group_rule" "ec2_web_ssh" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${var.ip_address}/32"]
security_group_id = aws_security_group.ec2_web.id
}
I'm trying to simply add another security rule:
resource "aws_security_group_rule" "ec2_web_ssh_test" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${var.ip_address}/32"]
security_group_id = aws_security_group.ec2_web.id
}
And terraform wants to completely replace the security group, and that cascades into completely replacing the ec2 instance.
I'm modifying the .tf file and then running:
terraform apply
EDIT:
The security group itself seems completely unrelated. When I do "plan", I get the output:
# aws_instance.ec2 must be replaced
-/+ resource "aws_instance" "ec2" {
...
~ security_groups = [ # forces replacement
+ "sg-0befd5d21eee052ad",
]
The ec2 instance is created with:
resource "aws_instance" "ec2" {
ami = "ami-0b5eea76982371e91"
instance_type = "t3.small"
key_name = "${var.project_name}"
depends_on = [aws_internet_gateway.main]
user_data = <<EOF
#!/bin/bash
sudo amazon-linux-extras install -y php8.1 mariadb10.5
sudo yum install -y httpd mariadb php8.1 php8.1-cli
sudo systemctl start httpd
sudo systemctl enable httpd
echo 'yup' | sudo tee /var/www/html/index.html
echo '<?php echo phpinfo();' | sudo tee /var/www/html/phpinfo.php
EOF
tags = {
Name = "${var.project_name}_${var.env}_ec2"
}
root_block_device {
volume_size = 8 # GB
volume_type = "gp3"
}
security_groups = [aws_security_group.ec2_web.id]
# vpc_security_group_ids = [aws_security_group.main.id]
subnet_id = aws_subnet.main1.id
}
If I comment out
# security_groups = [aws_security_group.bake.name]
I do not get any errors.
This happens because security_groups can only be used EC2-Classic (legacy instances) and a default VPC. For everything else you must use vpc_security_group_ids.
In your cause you are using custom VPC called main, thus you must be using vpc_security_group_ids, not security_groups.
Before apply please run terraform apply -refresh-only then your main problem is you can not define same rule with different terraform id.
When you apply new changes for ec2_web_ssh_test AWS will complain about
│Error: [WARN] A duplicate Security Group rule was found on (sg-xxxxxx). This may be
│ a side effect of a now-fixed Terraform issue causing two security groups with
│ identical attributes but different source_security_group_ids to overwrite each
│ other in the state
Then you will get this error from AWS api
Error: InvalidPermission.Duplicate: the specified rule "peer: xxx.xxx.xxx.xxx/32, TCP, from port: 22, to port: 22, ALLOW" already exists
With Terraform it compares the current state of your configuration with the new state which will contain the new rule you are adding. Here current state is not same as the desired state with new rule you are adding.
Hence with two different states, Terraform is trying to destroy the EC2 instances and trying to build new instances with newly added rules state.
This can be avoided with using terraform import command that will import the existing resources to your terraform state and then make changes to it.

ALB Health checks Targets Unhealthy

I am trying to provision an ECS cluster using Terraform along with an ALB. The targets come up as Unhealthy. The error code is 502 in the console Health checks failed with these codes: [502]
I checked through the AWS Troubleshooting guide and nothing helped there.
EDIT: I have no services/tasks running on the EC2 containers. Its a vanilla ECS cluster.
Here is my relevant code for the ALB:
# Target Group declaration
resource "aws_alb_target_group" "lb_target_group_somm" {
name = "${var.alb_name}-default"
port = 80
protocol = "HTTP"
vpc_id = "${var.vpc_id}"
deregistration_delay = "${var.deregistration_delay}"
health_check {
path = "/"
port = 80
protocol = "HTTP"
}
lifecycle {
create_before_destroy = true
}
tags = {
Environment = "${var.environment}"
}
depends_on = ["aws_alb.alb"]
}
# ALB Listener with default forward rule
resource "aws_alb_listener" "https_listener" {
load_balancer_arn = "${aws_alb.alb.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.lb_target_group_somm.arn}"
type = "forward"
}
}
# The ALB has a security group with ingress rules on TCP port 80 and egress rules to anywhere.
# There is a security group rule for the EC2 instances that allows ingress traffic to the ECS cluster from the ALB:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
/*from_port = 32768 */
from_port = 80
to_port = 65535
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
Has anyone hit this error and know how to debug/fix this ?
It looks like you're trying to be register the ECS cluster instances with the ALB target group. This isn't how you're meant to send traffic to an ECS service via an ALB.
Instead you should have your service join the tasks to the target group. This will mean that if you are using host networking then only the instances with the task deployed will be registered. If you are using bridge networking then it will add the ephemeral ports used by your task to your target group (including allowing for there to be multiple targets on a single instance). And if you are using awsvpc networking then it will register the ENIs of every task that the service spins up.
To do this you should use the load_balancer block in the aws_ecs_service resource. An example might look something like this:
resource "aws_ecs_service" "mongo" {
name = "mongodb"
cluster = "${aws_ecs_cluster.foo.id}"
task_definition = "${aws_ecs_task_definition.mongo.arn}"
desired_count = 3
iam_role = "${aws_iam_role.foo.arn}"
load_balancer {
target_group_arn = "${aws_lb_target_group.lb_target_group_somm.arn}"
container_name = "mongo"
container_port = 8080
}
}
If you were using bridge networking this would mean that the tasks are accessible on the ephemeral port range on the instances so your security group rule would need to look like this:
resource "aws_security_group_rule" "alb_to_ecs" {
type = "ingress"
from_port = 32768 # ephemeral port range for bridge networking tasks
to_port = 60999 # cat /proc/sys/net/ipv4/ip_local_port_range
protocol = "TCP"
source_security_group_id = "${module.alb.alb_security_group_id}"
security_group_id = "${module.ecs_cluster.ecs_instance_security_group_id}"
}
it looks like the http://ecsInstanceIp:80 is not returning HTTP 200 OK. I would check that first. It would be easy to check if the instance is public. It wont be the case most of the times. Otherwise I would create an EC2 instance and make a curl request to confirm that.
You may also check the container logs to see if its logging the health check response.
Hope this helps. good luck.

Unable to connect to terraform created AWS instance via ssh

I am trying to use terraform to spin up a VPC and single instance and then connect via ssh but I'm unable to. I'm aware I don't have any keys here but I'm trying to simply connect via the web terminal and it still says
There was a problem setting up the instance connection The connection
has been closed because the server is taking too long to respond. This
is usually caused by network problems, such as a spotty wireless
signal, or slow network speeds. Please check your network connection
and try again or contact your system administrator.
Is anyone able to look at my code and see what I'm doing wrong?
provider "aws" {
region = "us-east-2"
}
resource "aws_vpc" "vpc" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "test"
}
}
resource "aws_internet_gateway" "gateway" {
vpc_id = "${aws_vpc.vpc.id}"
tags = {
Name = "test"
}
}
resource "aws_subnet" "subnet" {
vpc_id = "${aws_vpc.vpc.id}"
cidr_block = "${aws_vpc.vpc.cidr_block}"
availability_zone = "us-east-2a"
map_public_ip_on_launch = true
tags = {
Name = "test"
}
}
resource "aws_route_table" "table" {
vpc_id = "${aws_vpc.vpc.id}"
route {
cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.gateway.id}"
}
tags = {
Name = "test"
}
}
resource "aws_route_table_association" "public" {
subnet_id = "${aws_subnet.subnet.id}"
route_table_id = "${aws_route_table.table.id}"
}
resource "aws_instance" "node" {
#ami = "ami-0d5d9d301c853a04a" # Ubuntu 18.04
ami = "ami-0d03add87774b12c5" # Ubuntu 16.04
instance_type = "t2.micro"
subnet_id = "${aws_subnet.subnet.id}"
}
UPDATE1: I've added key_name = "mykey" which I have previously created. I am unable to ping the public ip and upon trying to ssh with the key I get the following:
$ ssh -v -i ~/.ssh/mykey ubuntu#1.2.3.4
OpenSSH_7.9p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug1: Connecting to 1.2.3.4 [1.2.3.4] port 22.
where mykey and 1.2.3.4 have been changed for posting.
UPDATE2: Looking at the security group I don't see anything which stands out. The ACL for this has the following:
Rule # Type Protocol Port Range Source Allow / Deny
100 ALL Traffic ALL ALL 0.0.0.0/0 ALLOW
* ALL Traffic ALL ALL 0.0.0.0/0 DENY
Is this a problem? It seems that no one sees an issue with the terraform code so if anyone can confirm this is not a problem with the code then I think this can be closed out and moved to a different board since it would not be a code issue, correct?
The web console uses SSH to connect, so you still need to setup an SSH key. The only way to connect without an SSH key configured, and port 22 open in the Security Group, is to use AWS Systems Manager Session Manager, but that requires the SSM agent running on the EC2 instance and appropriate IAM roles assigned to the instance.
You have not supplied a key_name to indicate which SSH keypair to use.
If you don't have an existing aws_key_pair then you will also need to create one.

Cannot ssh into terraform deployment

I'm following this terraform tutorial found at gruntwork.io. I can use $ terraform apply to spin up a virtual machine, which shows in the aws console with a public facing ip address and everything. Unfortunately the instance seems to be attached to a previously defined security group and doesn't seems to be responding to ssh or curl as l might expect.
I've modified the security group so that the proper ports are open, and modified the tutorials main.tf file in an attempt to add a user account that l can use to at least see what's running on the vm with.
The results of terraform apply can be seen here
When l try to ssh into the instance with the test user and the associated private key l get a response of permission denied (this also happens if l try logging in as the default user, ubuntu). What am l misunderstanding that the security groups aren't being defined properly and that the user isn't being added to the instance properly. The tutorial was written for terraform 0.7, and l'm running with 0.11.10, but l can't imagine that something so basic would change.
The modified main.tf file is as follows
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# DEPLOY A SINGLE EC2 INSTANCE
# This template uses runs a simple "Hello, World" web server on a single EC2 Instance
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ------------------------------------------------------------------------------
# CONFIGURE OUR AWS CONNECTION
# ------------------------------------------------------------------------------
provider "aws" {
region = "us-east-1"
}
# ---------------------------------------------------------------------------------------------------------------------
# DEPLOY A SINGLE EC2 INSTANCE
# ---------------------------------------------------------------------------------------------------------------------
resource "aws_instance" "example" {
# Ubuntu Server 14.04 LTS (HVM), SSD Volume Type in us-east-1
ami = "ami-2d39803a"
instance_type = "t2.micro"
vpc_security_group_ids = ["${aws_security_group.instance.id}"]
user_data = <<-EOF
#!/bin/bash
echo "Hello, World" > index.html
nohup busybox httpd -f -p "${var.server_port}" &
EOF
tags {
Name = "terraform-example"
}
}
# ---------------------------------------------------------------------------------------------------------------------
# CREATE THE SECURITY GROUP THAT'S APPLIED TO THE EC2 INSTANCE
# ---------------------------------------------------------------------------------------------------------------------
resource "aws_security_group" "instance" {
name = "terraform-example-instance"
# Inbound HTTP from anywhere
ingress {
from_port = "${var.server_port}"
to_port = "${var.server_port}"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Inbound SSH from anywhere
ingress {
from_port = "22"
to_port = "22"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
variable "server_port" {
description = "The port the server will user for the HTTP requests"
default = 8080
}
# ---------------------------------------------------------------------------------------------------------------------
# Try to add a user to the spun up machine that we can ssh into the account of
# ---------------------------------------------------------------------------------------------------------------------
resource "aws_iam_user" "user" {
name = "test-user"
path = "/"
}
resource "aws_iam_user_ssh_key" "user" {
username = "${aws_iam_user.user.name}"
encoding = "SSH"
public_key = <public_key>
}
You did not specify any keypair name while creating ec2 instance.
For using ubuntu user for ssh you should specify keypair name