AWS: ECS Service with Elastic IP - amazon-web-services

I've created a cluster, VPC, subnet and a Fargate service using the first run wizard of ECS on AWS console and uploaded the image on ECR and deployed successfully.
Now I need the service to access a remote database. So, I need to add the IP in the firewall's whitelist. I allocated an Elastic IP, created a NAT Gateway and updated the router table following this tutorial.
I stopped the task and tried to run it again. But then I could not pull the image from ECR to run a new task caused by the following error message:
CannotPullContainerError: Error response from daemon: Get https://account-id.dkr.ecr.sa-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
My setup:
VPC with CIDR 10.0.0.0/16 (automatically created on ECS wizard)
Subnet with the following router table:
Destination | Target
----------------|-------------
10.0.0.0/16 | local
0.0.0.0/0 | nat-<nat-id>
NAT Gateway, on VPC and subnets that were created on ECS Wizard and the Elastic IP I allocated.
Currently, I'm allowing all traffic in both inbound and outbound rules:
Type | Protocol | Port range | Source | Description - optional
-----|----------|------------|---------|------------------------
All | All | All |0.0.0.0/0| -
What am I missing? Is this the only way I can accomplish what I want? Is there a simpler way to achieve it? I found in Stack Overflow another way to associate an Elastic IP by using Application Load Balancer or Network Load Balancer. Is it a better approach?

The ECS wizard creates a VPC with two public subnets 10.0.0.0/24 and 10.0.1.0/24. They both use a single RT which points to internet gateway (IGW). However, from your question it appears that you've modified it to use NAT.
Sadly, this will not work, as you've already experienced. To rectify the issue, you could create a third subnet (or more if you need for HA). The subnet will be private with no internet connection. Instead it will have a new RT which will route internet traffic to NAT. Your Fargate tasks would be launched in the private subnet(s).
The new RT of the new subnet(s) would be:
Destination | Target
----------------|-------------
10.0.0.0/16 | local
0.0.0.0/0 | nat-<nat-id>
The RT of the two original public subnets, should be modified to route traffic to IGW, like it was originally done:
Destination | Target
----------------|-------------
10.0.0.0/16 | local
0.0.0.0/0 | IGW

I thought of explaining this. you are getting the CannotPullContainerError error is because there is no route to the internet. The traffic to the ECR go through the internet by default.
your Fargate service is running in a private subnet which does not have direct routes to the internet. In order to get internet access, the private subnet where the Fargate task is running should have routes in its route table to route the traffic to the internet via the NatGateway (you have done this already). thereore
Destination | Target
----------------|-------------
10.0.0.0/16 | local
0.0.0.0/0 | Natgateway
The NatGateway simply routes the traffic to the internet gateway. The Nat gateway is deployed in the public subnet and it will have routes to internet via the internet gateway. Therefore the subnet where the Nat Gateway is deployed should have the following route created.
Destination | Target
----------------|-------------
10.0.0.0/16 | local
0.0.0.0/0 | InternetGateway
Note:
You can also talk to ECR privately without going through the internet by creating a Private ECR VPC end point.
Internet Gateway

Related

AWS Lambda Timeout when making Https request with NAT Gateway in place

I have a containerized Lambda deployed that uses the latest image from ECR. I have also already setup the networking with the following:
Subnets
Route Table
private-1.
private-route
private-2.
private-route
public-0.
public-route.
Subnet CIDRs
private-1: 172.31.32.0/20
private-2: 172.31.48.0/20
Private Route Table
Destination
Target
172.31.0.0/16
local.
0.0.0.0/0.
nat-xxx
Public Route Table
Destination
Target
172.31.0.0/16
local.
0.0.0.0/0.
igw-xxx
The Nat Gateway is associatd with the public-1 subnet.
Security Group
All Traffic Inbound and Outbound is allowed for now (for debugging).
Lambda Configuration
Subnets.
Security Groups
private 1 & 2
sg-xxx.
My problem is after I set up this configuration I was able to get access, then I added configuration to talk to a peered vpc for a database connection and that also worked. But immediately following It stopped working. So my confusion is why is it sporadic? I'll randomly change security groups and redo networking and it works once then stops. This is the error I constantly see now which is the first few lines of the lambda invocation.
"errorMessage": "HTTPSConnectionPool(host='maps.googleapis.com', port=443): Max retries exceeded with url: /maps/api/geocode/json?

AWS Client VPN with a Fixed IP

In order to give our Developers access to IP Restricted internal and partner applications i'm setting up AWS Client VPN. I've manage to get everything running even with Internet access. As expected the Public IP is changing.
I've created an NAT Gateway, assigned an Elastic IP and changed the route of the Subnet to use the NAT Gateway instead of the Internet Gateway to reach the internet (0.0.0.0/0).
The problem now is, that clients can't reach the internet at all once connected to the VPN. What part am i missing to get internet access working again and using the NAT Gateway with the static ip?
The Setup is absolutely basic. 1 new VPC, 1 Subnet, 1 Client VPN Endpoint, 1 Security Group.
Your setup is very common and there's probably just a simple mistake. The pattern you are following is the private/public subnet, even though these terms are not used that much in AWS.
When you have a subnet that is configured to use a NAT Gateway (as the 0.0.0.0/0 route on the route table), that subnet can be referred as a "private subnet", as there will be no direct access from Internet to it.
But the NAT Gateway itself needs to be placed on a "public subnet", i.e. needs to be in a subnet where the default route 0.0.0.0/0 goes to an Internet Gateway. (Not in the scope of your question, but that's the same common mistake done with Load Balancers. If you have a LB that should serve users in the internet, even if your servers are in a private subnet, the LB needs to be deployed to a public subnet).
So to summarize:
| Subnet # | Type | Default Route (route table) | What to place here? |
|----------|---------|-------------------------------|---------------------|
| 1 | Public | 0.0.0.0/0 -> Internet Gateway | NAT Gateway |
| 2 | Private | 0.0.0.0/0 -> NAT Gateway | Users' applications |
You need to have a route from your NAT Gateway to your Internet Gateway, otherwise whoever is routing traffic to the NAT Gateway is not really reaching the Internet.
Please, have a look at the pattern at https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html#VPC_Scenario2_Routing for accessing the Internet.

Request Timeout ( HTTP 408) : loadbalancer backed by ecs-fargate with nginx image

I am trying to create following infrastructure using terraform.
LoadBalancer -> ECS-Service -> Fargate (nginx images, count=2)
After applying terraform plan, I can see that a target groups shows two healthy targets. But when i try to access loadbalancer dns name from browser, I am getting request-timeout. Ping is also not working for lb dns name.
Loadbalancer is an non-internal application loadbalancer with security-group allowing all traffic on 80 port to all IPv4.
Need help.
Did you configure LB in the public subnet? seems like it's in a private subnet, and did you tried to access the application from aws network to verify the LB working inside AWS VPC network?
curl lb_dns
or
nslookup lb_dns
from any ec2 machine within the VPC if that worked, its mean the LB in only reachable with in private subnet, move LB to public subnet and should work.

Access AWS Network Load Balancer from VPC

I have a VPC with current subnets:
public-subnet has access to Internet Gateway:
routing public:
172.31.0.0/16 -> local
0.0.0.0/0 -> igw
private-subnet has NAT Gateway:
routing private:
172.31.0.0/16 -> local
0.0.0.0/0 -> nat
internal-subnet have access to Internet Gateway:
routing internal:
172.31.0.0/16 -> local
0.0.0.0/0 -> nat
My Lambda function is deployed in a private subnet. In internal-subnet deployed network balancer and ecs task. I want to make requests from my lambda in private-subnet to network load balancer.
I tried different approaches but always get timeouts. I tried to make request bu private task IP but still getting timeouts.
How to configure access from one subnet to load balancer in another subnet inside VPC?

Private subnet not talking to public - health check failing

I am trying to achieve the following architecture depicted in this blog
I have a Fargate Service using an ENI (with private IP of 10.0.241.85) running in a private subnet (let's call 'subnet-1'). The ENI also has an Elastic IP as it fails to pull the image from ECR if not. I don't think this will matter though? The container in my service is exposing ports 3000/4000. I then have my ALB & NAT gateway in a public subnet (let's call this 'subnet-2'). The ALB forwards traffic on ports 80/443 to the necessary target group. The target group has 2 registered tasks targeting the private IP on the ENI (1 on port 3000 & the other on 4000). To the best of my knowledge, this should allow traffic in, correct?
For traffic out, subnet-1 has a default route (0.0.0.0/0) to the NAT gateway in subnet-2, this should allow traffic out, correct?
All services are in the same VPC & the same availability zone (where applicable)
I have 2 security groups utilised by these services:
test
api
test (inbound)
test (outbound)
api (inbound)
api (outbound)
We leverage ephemeral ports for communication between the 2 security groups
NOTE: I removed the destination here but, yeah, the destination is test security group
| Service | Security Groups |
|---------|-----------------|
| ENI | test |
| | api |
|---------|-----------------|
| ALB | api |
|---------|-----------------|
subnet-1 route table
subnet-2 route table
NOTE: The covered route is just a peering connection out so nothing to do with this
From what I know, the 2 subnets should be able to communicate using the private IPs of the services within them which is what I have done here.
The health check fails with the generic message of:
Task failed ELB health checks
I have also looked to this blog for a bit more help but to no avail.
Any help would be greatly appreciated :)
If your task is listening on ports 3000 & 4000, your security group (test I guess, based on your comments) will need to permit these ports. As configured now, I don't see ports 3000 and 4000 as allowed.
Couple of other notes - an elastic IP on your ENI in a private subnet won't do anything as private subnets cannot have direct access to the internet. If you're having problems connecting to ECR without it there must be some other problem.
Also, your SG rules are permitting very large CIDR blocks like 0.0.0.0/0. A more secure configuration would only permit the particular security group that needs access. In this case, you would want the ports for your app (sounds like 3000 and 4000) from the SG ID of your load balancer.