I need to create a VPC + RDS Postgres DB + RDS Proxy with private subnets, I'm setting up everything with CloudFormation using serverless.
I'm able to configure everything except the RDS Proxy, when I run serverless deploy the deployment hangs when creating the Target group and eventually shows a timeout error.
This is what I see in the AWS Console, even if I do the full process manually:
When I run aws rds describe-db-proxy-targets --db-proxy-name so-proxy-rds-db-proxy this is what I get:
{
"Targets": [
{
"Endpoint": "so-proxy-rds-db.csy8ozys6dtv.us-west-2.rds.amazonaws.com",
"RdsResourceId": "so-proxy-rds-db",
"Port": 5432,
"Type": "RDS_INSTANCE",
"Role": "READ_WRITE",
"TargetHealth": {
"State": "UNAVAILABLE",
"Reason": "AUTH_FAILURE",
"Description": "Proxy does not have any registered credentials"
}
}
]
}
However I'm able to see them in the Secrets Manager. This is the reproducible configuration:
service: so-proxy-rds
frameworkVersion: "=2.49.0"
variablesResolutionMode: 20210326
configValidationMode: error
provider:
name: aws
runtime: nodejs14.x
region: us-west-2
versionFunctions: false
memorySize: 1024
timeout: 30
resources:
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- Key: Name
Value: vpc
PrivateSubnetA:
Type: AWS::EC2::Subnet
Properties:
VpcId:
Ref: VPC
CidrBlock: 10.0.2.0/24
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: ""
Tags:
- Key: Name
Value: private-A
PrivateSubnetB:
Type: AWS::EC2::Subnet
Properties:
VpcId:
Ref: VPC
CidrBlock: 10.0.3.0/24
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: ""
Tags:
- Key: Name
Value: private-B
PrivateRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId:
Ref: VPC
Tags:
- Key: Name
Value: private
PrivateSubnetARouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId:
Ref: PrivateSubnetA
RouteTableId:
Ref: PrivateRouteTable
PrivateSubnetBRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId:
Ref: PrivateSubnetB
RouteTableId:
Ref: PrivateRouteTable
OpenSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: 'Open firewall'
GroupName: ${self:service}-open
SecurityGroupEgress:
- CidrIp: 0.0.0.0/0
IpProtocol: "-1"
SecurityGroupIngress:
- CidrIp: 0.0.0.0/0
IpProtocol: "-1"
VpcId:
Ref: VPC
DBSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: DB subnet group
SubnetIds:
- Ref: PrivateSubnetA
- Ref: PrivateSubnetB
PostgresDB:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: db
DBName: test_db
AllocatedStorage: 20
DBInstanceClass: db.t2.micro
DBSubnetGroupName:
Ref: DBSubnetGroup
Engine: postgres
EngineVersion: 11.12 # only up to version 11 supports proxy
MasterUsername: test_user
MasterUserPassword: test_pass
PubliclyAccessible: false
VPCSecurityGroups:
- Ref: OpenSecurityGroup
DeletionPolicy: Delete
ProxySecretValues:
Type: 'AWS::SecretsManager::Secret'
Properties:
Name: proxy-secrets
SecretString: '{"username":"test_user","password":"test_pass"}'
DBProxy:
Type: AWS::RDS::DBProxy
Properties:
DBProxyName: db-proxy
EngineFamily: POSTGRESQL
RoleArn:
Fn::GetAtt:
- DBProxyRole
- Arn
Auth:
- AuthScheme: SECRETS
IAMAuth: DISABLED
SecretArn:
Ref: ProxySecretValues
VpcSubnetIds:
- Ref: PrivateSubnetA
- Ref: PrivateSubnetB
VpcSecurityGroupIds:
- Ref: OpenSecurityGroup
DBProxyTargetGroup:
Type: "AWS::RDS::DBProxyTargetGroup"
Properties:
DBInstanceIdentifiers:
- Ref: PostgresDB
DBProxyName:
Ref: DBProxy
TargetGroupName: default
DBProxyRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- "rds.amazonaws.com"
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- Ref: DBProxyPolicy
RoleName: db-proxy-role
DBProxyPolicy:
Type: "AWS::IAM::ManagedPolicy"
Properties:
ManagedPolicyName: db-proxy-policy
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "secretsmanager:*"
Resource: '*'
- Effect: Allow
Action:
- "kms:*"
Resource: '*'
Any help is greatly appreciated, thanks.
I dont see the role for roleArn entry declared in your "DBProxy" resource.
See the documentation:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbproxy.html?icmpid=docs_cfn_console_designer
Take note of the "RoleArn" in the CF snippet below:
"Resources": {
"TestDBProxy": {
"Type": "AWS::RDS::DBProxy",
"Properties": {
"DebugLogging": true,
"DBProxyName": {
"Ref": "ProxyName"
},
"EngineFamily": "MYSQL",
"IdleClientTimeout": 120,
"RequireTLS": true,
"RoleArn": {
"Ref": "BootstrapSecretReaderRoleArn"
},
"Auth": [
{
"AuthScheme": "SECRETS",
"SecretArn": {
"Ref": "BootstrapProxySecretArn"
},
"IAMAuth": "DISABLED"
}
],
"VpcSubnetIds": {
"Fn::Split": [
",",
{
"Ref": "SubnetIds"
}
]
}
}
}
I'm unable to replicate the same error you had but I had the same error msg and managed to resolve it by going to Cloudwatch service and search for 'rds/proxy' log group.
you will able to get a more granular error message.
The output for aws rds describe-db-proxy-targets --db-proxy-name so-proxy-rds-db-proxy was very misleading.
In my case, I never gave rds permission to assume my proxy role. It can be quickly fixed with
Statement:
- Effect: Allow
Principal:
Service:
- "rds.amazonaws.com"
Action:
- "sts:AssumeRole"
hope this will be helpful.
Related
This is basically the same issue as in this question, but the answers there didn't get me to a solution.
My configuration is: 1 VPC, 1 subnet, 1 security group. My Lambda runs in the VPC/subnet/security group and tries to add a message to an SQS queue, but gets a timeout. I've double-checked the permissions granted to the lambda, the policy on the VPC Endpoint, the policy on the SQS queue, opened the rules on the Security Group, ensured the Network ACLs are open.
I successfully went through this tutorial, which sets up VPC/etc+EC2 with cloudformation, then demonstrates sending a message to SQS from EC2.
To reproduce my problem, I started with the cloudformation from that tutorial and added the following to it:
the VPC Endpoint (rather than creating it through console like in the tutorial)
a Lambda (plus IAM role+policy) in the same VPC that tries to send a message to the SQS queue
The resulting cloudformation template is below.
I can reproduce the problem like this:
Create the cloudformation template (see below) (note that I had to make one small change to the template in the tutorial to get it to work in us-west-2).
SSH to the EC2 and run the command to send an SQS message (see step 5 from the tutorial). This succeeds.
In the console, go to the Lambda, paste the URL of the SQS queue into the code, deploy, and run the lambda. It times out.
In the console, edit the Lambda configuration to set VPC=None, then rerun the lambda. It succeeds.
So the SQS queue is accessible by the lambda outside the VPC, and by EC2 inside the VPC/subnet/sg, but not the lambda inside the VPC/subnet/sg.
Any idea what could be missing?
Cloudformation (from tutorial + my additions):
# Copied from this tutorial: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-sending-messages-from-vpc.html
AWSTemplateFormatVersion: 2010-09-09
Description: CloudFormation Template for SQS VPC Endpoints Tutorial
Parameters:
KeyName:
Description: Name of an existing EC2 KeyPair to enable SSH access to the instance
Type: 'AWS::EC2::KeyPair::KeyName'
ConstraintDescription: must be the name of an existing EC2 KeyPair.
SSHLocation:
Description: The IP address range that can be used to SSH to the EC2 instance
Type: String
MinLength: '9'
MaxLength: '18'
Default: 0.0.0.0/0
AllowedPattern: '(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})'
ConstraintDescription: must be a valid IP CIDR range of the form x.x.x.x/x.
Conditions:
IsT3Supported: !Equals [!Ref 'AWS::Region', eu-north-1]
Mappings:
RegionMap:
us-east-1:
AMI: ami-428aa838
us-east-2:
AMI: ami-710e2414
us-west-1:
AMI: ami-4a787a2a
us-west-2:
AMI: ami-7f43f307
ap-northeast-1:
AMI: ami-c2680fa4
ap-northeast-2:
AMI: ami-3e04a450
ap-southeast-1:
AMI: ami-4f89f533
ap-southeast-2:
AMI: ami-38708c5a
ap-south-1:
AMI: ami-3b2f7954
ca-central-1:
AMI: ami-7549cc11
eu-central-1:
AMI: ami-1b2bb774
eu-west-1:
AMI: ami-db1688a2
eu-west-2:
AMI: ami-6d263d09
eu-north-1:
AMI: ami-87fe70f9
eu-west-3:
AMI: ami-5ce55321
sa-east-1:
AMI: ami-f1337e9d
Resources:
VPC:
Type: 'AWS::EC2::VPC'
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsSupport: 'true'
EnableDnsHostnames: 'true'
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-VPC
Subnet:
Type: 'AWS::EC2::Subnet'
Properties:
VpcId: !Ref VPC
# I had to add (uncomment) this line to avoid using us-west-2d, which doesn't support the instance type
# AvailabilityZone: us-west-2a
CidrBlock: 10.0.0.0/24
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-Subnet
InternetGateway:
Type: 'AWS::EC2::InternetGateway'
Properties:
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-InternetGateway
VPCGatewayAttachment:
Type: 'AWS::EC2::VPCGatewayAttachment'
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
RouteTable:
Type: 'AWS::EC2::RouteTable'
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-RouteTable
SubnetRouteTableAssociation:
Type: 'AWS::EC2::SubnetRouteTableAssociation'
Properties:
RouteTableId: !Ref RouteTable
SubnetId: !Ref Subnet
InternetGatewayRoute:
Type: 'AWS::EC2::Route'
Properties:
RouteTableId: !Ref RouteTable
GatewayId: !Ref InternetGateway
DestinationCidrBlock: 0.0.0.0/0
SecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupName: SQS VPCE Tutorial Security Group
GroupDescription: Security group for SQS VPC endpoint tutorial
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: '-1'
CidrIp: 10.0.0.0/16
- IpProtocol: tcp
FromPort: '22'
ToPort: '22'
CidrIp: !Ref SSHLocation
SecurityGroupEgress:
- IpProtocol: '-1'
CidrIp: 10.0.0.0/16
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-SecurityGroup
EC2Instance:
Type: 'AWS::EC2::Instance'
Properties:
KeyName: !Ref KeyName
InstanceType: !If [IsT3Supported, t3.micro, t2.micro]
ImageId: !FindInMap
- RegionMap
- !Ref 'AWS::Region'
- AMI
NetworkInterfaces:
- AssociatePublicIpAddress: 'true'
DeviceIndex: '0'
GroupSet:
- !Ref SecurityGroup
SubnetId: !Ref Subnet
IamInstanceProfile: !Ref EC2InstanceProfile
Tags:
- Key: Name
Value: SQS-VPCE-Tutorial-EC2Instance
EC2InstanceProfile:
Type: 'AWS::IAM::InstanceProfile'
Properties:
Roles:
- !Ref EC2InstanceRole
InstanceProfileName: !Sub 'EC2InstanceProfile-${AWS::Region}'
EC2InstanceRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: !Sub 'SQS-VPCE-Tutorial-EC2InstanceRole-${AWS::Region}'
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: 'sts:AssumeRole'
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AmazonSQSFullAccess'
CFQueue:
Type: 'AWS::SQS::Queue'
Properties:
VisibilityTimeout: 60
# Stuff I added starting here:
VPCEndpointForSQS:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
VpcEndpointType: 'Interface'
PolicyDocument:
Statement:
- Action: '*'
Effect: Allow
Resource: '*'
Principal: '*'
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.sqs'
VpcId: !Ref VPC
SubnetIds:
- !Ref Subnet
PrivateDnsEnabled: true
SecurityGroupIds:
- !Ref SecurityGroup
LambdaRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: !Sub 'SQS-VPCE-Tutorial-LambdaRole-${AWS::Region}'
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AmazonSQSFullAccess'
- 'arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole'
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- 'sts:AssumeRole'
LambdaPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: !Sub 'SQS-VPCE-Tutorial-LambdaPolicy-${AWS::Region}'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 'logs:CreateLogGroup'
Resource: '*'
- Effect: Allow
Action:
- logs:CreateLogStream
- logs:PutLogEvents
Resource: '*'
Roles:
- !Ref LambdaRole
LambdaFunction:
Type: 'AWS::Lambda::Function'
Properties:
FunctionName: 'SQS-VPCE-Tutorial-Lambda'
Role: !GetAtt LambdaRole.Arn
Runtime: 'python3.9'
Handler: 'index.lambda_handler'
Timeout: 20
VpcConfig:
SecurityGroupIds:
- !Ref SecurityGroup
SubnetIds:
- !Ref Subnet
Code:
ZipFile: |
import json
import boto3
from botocore.exceptions import ClientError
sqs = boto3.resource('sqs')
queue = sqs.Queue('<INSERT SQS QUEUE URL HERE>')
def lambda_handler(event, context):
print("before")
queue.send_message(MessageBody='Hello from Amazon SQS.')
print("after")
Of course, as soon as I posted this, I found this answer, that there is a bug in boto3 that prevents it from using VPC Endpoint for SQS by default. I tried the solution there and is solved the problem!
I'm experiencing a very annoying problem. I created a CI/CD pipelines using AWS CodePipeline and CloudFormation.
This is the template.yml used by CloudFormation to create a ScheduledTask on ECS.
AWSTemplateFormatVersion: "2010-09-09"
Description: Template for deploying a ECR image on ECS
Resources:
VPC:
Type: "AWS::EC2::VPC"
Properties:
CidrBlock: "10.0.0.0/16"
EnableDnsSupport: true
EnableDnsHostnames: true
InstanceTenancy: default
Subnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [0, !GetAZs ""]
CidrBlock: !Sub "10.0.0.0/20"
MapPublicIpOnLaunch: true
Subnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [1, !GetAZs ""]
CidrBlock: !Sub "10.0.32.0/20"
MapPublicIpOnLaunch: true
InternetGateway:
Type: "AWS::EC2::InternetGateway"
VPCGatewayAttachment:
Type: "AWS::EC2::VPCGatewayAttachment"
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref VPC
RouteTable:
Type: "AWS::EC2::RouteTable"
Properties:
VpcId: !Ref VPC
RouteTableAssociation1:
Type: "AWS::EC2::SubnetRouteTableAssociation"
Properties:
SubnetId: !Ref Subnet1
RouteTableId: !Ref RouteTable
RouteTableAssociation2:
Type: "AWS::EC2::SubnetRouteTableAssociation"
Properties:
SubnetId: !Ref Subnet2
RouteTableId: !Ref RouteTable
InternetRoute:
Type: "AWS::EC2::Route"
DependsOn: VPCGatewayAttachment
Properties:
GatewayId: !Ref InternetGateway
RouteTableId: !Ref RouteTable
DestinationCidrBlock: "0.0.0.0/0"
ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: "SLAComputation"
LoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: ecs-services
Subnets:
- !Ref "Subnet1"
- !Ref "Subnet2"
SecurityGroups:
- !Ref LoadBalancerSecurityGroup
LoadBalancerListener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
LoadBalancerArn: !Ref LoadBalancer
Protocol: HTTP
Port: 80
DefaultActions:
- Type: forward
TargetGroupArn: !Ref DefaultTargetGroup
LoadBalancerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for loadbalancer to services on ECS
VpcId: !Ref "VPC"
SecurityGroupIngress:
- CidrIp: 0.0.0.0/0
IpProtocol: -1
DefaultTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: default
VpcId: !Ref "VPC"
Protocol: "HTTP"
Port: "80"
CloudWatchLogsGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: "sla_computation"
RetentionInDays: 1
ContainerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
VpcId: !Ref "VPC"
GroupDescription: for ecs containers
SecurityGroupIngress:
- SourceSecurityGroupId: !Ref "LoadBalancerSecurityGroup"
IpProtocol: -1
Task:
Type: AWS::ECS::TaskDefinition
Properties:
Family: apis
Cpu: 1024
Memory: 2048
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref ECSTaskExecutionRole
ContainerDefinitions:
- Name: ass001
Image: !Sub 649905970782.dkr.ecr.eu-west-1.amazonaws.com/ass001:latest
Cpu: 1024
Memory: 2048
HealthCheck:
Command: [ "CMD-SHELL", "exit 0" ]
Interval: 30
Retries: 5
Timeout: 10
StartPeriod: 30
PortMappings:
- ContainerPort: 8080
Protocol: tcp
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: "sla_computation"
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: "ass001"
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: employee-tg
VpcId: !Ref VPC
Port: 80
Protocol: HTTP
Matcher:
HttpCode: 200-299
TargetType: ip
ListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
ListenerArn: !Ref LoadBalancerListener
Priority: 2
Conditions:
- Field: path-pattern
Values:
- /*
Actions:
- TargetGroupArn: !Ref TargetGroup
Type: forward
ECSTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs-tasks.amazonaws.com]
Action: ["sts:AssumeRole"]
Path: /
Policies:
- PolicyName: AmazonECSTaskExecutionRolePolicy
PolicyDocument:
Statement:
- Effect: Allow
Action:
# ECS Tasks to download images from ECR
- "ecr:GetAuthorizationToken"
- "ecr:BatchCheckLayerAvailability"
- "ecr:GetDownloadUrlForLayer"
- "ecr:BatchGetImage"
# ECS tasks to upload logs to CloudWatch
- "logs:CreateLogStream"
- "logs:PutLogEvents"
Resource: "*"
TaskSchedule:
Type: AWS::Events::Rule
Properties:
Description: SLA rule ass001
Name: ass001
ScheduleExpression: cron(0/5 * * * ? *)
State: ENABLED
Targets:
- Arn:
!GetAtt ECSCluster.Arn
Id: dump-data-ecs-task
RoleArn:
!GetAtt ECSTaskExecutionRole.Arn
EcsParameters:
TaskDefinitionArn:
!Ref Task
TaskCount: 1
LaunchType: FARGATE
PlatformVersion: LATEST
NetworkConfiguration:
AwsVpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- sg-07db5ae6616a8c5fc
Subnets:
- subnet-031d0787ad492c1c4
TaskSchedule:
Type: AWS::Events::Rule
Properties:
Description: SLA rule ass002
Name: ass002
ScheduleExpression: cron(0/5 * * * ? *)
State: ENABLED
Targets:
- Arn:
!GetAtt ECSCluster.Arn
Id: dump-data-ecs-task
RoleArn:
!GetAtt ECSTaskExecutionRole.Arn
EcsParameters:
TaskDefinitionArn:
!Ref Task
TaskCount: 1
LaunchType: FARGATE
PlatformVersion: LATEST
NetworkConfiguration:
AwsVpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- sg-07db5ae6616a8c5fc
Subnets:
- subnet-031d0787ad492c1c4
Outputs:
ApiEndpoint:
Description: Employee API Endpoint
Value: !Join ["", ["http://", !GetAtt LoadBalancer.DNSName, "/employees"]]
Export:
Name: "EmployeeApiEndpoint"
The ScheduledTask is created successfully but it is not running actually. Very strange. But the strangest thing is that the ScheduledTask starts working when I click on "Edit" from the AWS console and (without making any change) I save.
The main issue I see is that you are using wrong role for your scheduled rule. It can't be !GetAtt ECSTaskExecutionRole.Arn. Instead you should create new role (or edit existing one) which has AmazonEC2ContainerServiceEventsRole AWS Managed policy.
It works after you edit in console, because AWS console will probably create the correct role in the background and use it instead of yours.
I am new to AWS Cloud Formation, well I am reusing 2 templates, the first one works totally fine, it creates a Network Stack for AWS Fargate, please see template #1 below, but the second one (which is failing) supposed to creates the services, instead it is trying to delete most of the elements of the Network Stack, please see template #2 below.
I can see in the "Changes Preview" how it is marking to "remove" almost everything that I created before with the Network Stack template, please see image below #3.
Does somebody can advise what is wrong with the second template?, thank you.
1) Network Stack
AWSTemplateFormatVersion: '2010-09-09'
Description: Create a network stack with a public vpc, fargate cluster and load balancer as a parent stack.
Mappings:
SubnetConfig:
VPC:
CIDR: '10.0.0.0/16'
PublicOne:
CIDR: '10.0.0.0/24'
PublicTwo:
CIDR: '10.0.1.0/24'
Resources:
FargateVpc:
Type: AWS::EC2::VPC
Properties:
EnableDnsSupport: true
EnableDnsHostnames: true
CidrBlock: !FindInMap ['SubnetConfig', 'VPC', 'CIDR']
PublicSubnetOne:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref FargateVpc
CidrBlock: !FindInMap ['SubnetConfig', 'PublicOne', 'CIDR']
MapPublicIpOnLaunch: true
PublicSubnetTwo:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref FargateVpc
CidrBlock: !FindInMap ['SubnetConfig', 'PublicTwo', 'CIDR']
MapPublicIpOnLaunch: true
InternetGateway:
Type: AWS::EC2::InternetGateway
GatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref FargateVpc
InternetGatewayId: !Ref InternetGateway
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref FargateVpc
PublicRoute:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: '0.0.0.0/0'
GatewayId: !Ref InternetGateway
PublicSubnetOneRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetOne
RouteTableId: !Ref PublicRouteTable
PublicSubnetTwoRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetTwo
RouteTableId: !Ref PublicRouteTable
# ECS Cluster
ECSCluster:
Type: AWS::ECS::Cluster
# ECS Roles
# ECS Roles
# This role is used by the ECS tasks themselves.
ECSTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs-tasks.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: AmazonECSTaskExecutionRolePolicy
PolicyDocument:
Statement:
- Effect: Allow
Action:
# Allow the ECS Tasks to download images from ECR
- 'ecr:GetAuthorizationToken'
- 'ecr:BatchCheckLayerAvailability'
- 'ecr:GetDownloadUrlForLayer'
- 'ecr:BatchGetImage'
# Allow the ECS tasks to upload logs to CloudWatch
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource: '*'
# This is an IAM role which authorizes ECS to manage resources on our
# account on our behalf, such as updating our load balancer with the
# details of where our containers are, so that traffic can reach your
# containers.
ECSRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: ecs-service
PolicyDocument:
Statement:
- Effect: Allow
Action:
# Rules which allow ECS to attach network interfaces to instances
# on our behalf in order for awsvpc networking mode to work right
- 'ec2:AttachNetworkInterface'
- 'ec2:CreateNetworkInterface'
- 'ec2:CreateNetworkInterfacePermission'
- 'ec2:DeleteNetworkInterface'
- 'ec2:DeleteNetworkInterfacePermission'
- 'ec2:Describe*'
- 'ec2:DetachNetworkInterface'
# Rules which allow ECS to update load balancers on our behalf
# with the information about how to send traffic to our containers
- 'elasticloadbalancing:DeregisterInstancesFromLoadBalancer'
- 'elasticloadbalancing:DeregisterTargets'
- 'elasticloadbalancing:Describe*'
- 'elasticloadbalancing:RegisterInstancesWithLoadBalancer'
- 'elasticloadbalancing:RegisterTargets'
Resource: '*'
# Load Balancer Security group
PublicLoadBalancerSG:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Access to the public facing load balancer from entire internet range
VpcId: !Ref FargateVpc
SecurityGroupIngress:
- CidrIp: 0.0.0.0/0
IpProtocol: -1
# Fargate Container Security Group
FargateContainerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Access to fargate containers
VpcId: !Ref FargateVpc
EcsSecurityGroupIngressFromPublicALB:
Type: AWS::EC2::SecurityGroupIngress
Properties:
Description: Ingress from the public ALB
GroupId: !Ref FargateContainerSecurityGroup
IpProtocol: -1
SourceSecurityGroupId: !Ref PublicLoadBalancerSG
EcsSecurityGroupIngressFromSelf:
Type: AWS::EC2::SecurityGroupIngress
Properties:
Description: Ingress from other containers in the same security group
GroupId: !Ref FargateContainerSecurityGroup
IpProtocol: -1
SourceSecurityGroupId: !Ref FargateContainerSecurityGroup
# Load Balancer
PublicLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Scheme: internet-facing
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '30'
Subnets:
- !Ref PublicSubnetOne
- !Ref PublicSubnetTwo
SecurityGroups: [!Ref 'PublicLoadBalancerSG']
# Target Group
DummyTargetGroupPublic:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 6
HealthCheckPath: /
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
Name: !Join ['-', [!Ref 'AWS::StackName', 'drop-1']]
Port: 80
Protocol: HTTP
UnhealthyThresholdCount: 2
VpcId: !Ref 'FargateVpc'
# Listener
PublicLoadBalancerListener:
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn:
- PublicLoadBalancer
Properties:
DefaultActions:
- TargetGroupArn: !Ref 'DummyTargetGroupPublic'
Type: 'forward'
LoadBalancerArn: !Ref 'PublicLoadBalancer'
Port: 80
Protocol: HTTP
Outputs:
VPCId:
Description: The ID of the vpc that this stack is deployed on
Value: !Ref FargateVpc
Export:
Name: !Join [':', [!Ref 'AWS::StackName', 'VPCId']]
PublicSubnetOne:
Description: Public subnet one
Value: !Ref 'PublicSubnetOne'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'PublicSubnetOne' ] ]
PublicSubnetTwo:
Description: Public subnet two
Value: !Ref 'PublicSubnetTwo'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'PublicSubnetTwo' ] ]
FargateContainerSecurityGroup:
Description: A security group used to allow Fargate containers to receive traffic
Value: !Ref 'FargateContainerSecurityGroup'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'FargateContainerSecurityGroup' ] ]
# ECS Outputs
ClusterName:
Description: The name of the ECS cluster
Value: !Ref 'ECSCluster'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'ClusterName' ] ]
ECSRole:
Description: The ARN of the ECS role
Value: !GetAtt 'ECSRole.Arn'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'ECSRole' ] ]
ECSTaskExecutionRole:
Description: The ARN of the ECS role
Value: !GetAtt 'ECSTaskExecutionRole.Arn'
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'ECSTaskExecutionRole' ] ]
PublicListener:
Description: The ARN of the public load balancer's Listener
Value: !Ref PublicLoadBalancerListener
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'PublicListener' ] ]
ExternalUrl:
Description: The url of the external load balancer
Value: !Join ['', ['http://', !GetAtt 'PublicLoadBalancer.DNSName']]
Export:
Name: !Join [ ':', [ !Ref 'AWS::StackName', 'ExternalUrl' ] ]
2) Service Stack
AWSTemplateFormatVersion: '2010-09-09'
Description: Deploy a service on AWS Fargate, hosted in a public subnet of a VPC, and accessible via a public load balancer
# Input Paramters
Parameters:
StackName:
Type: String
Default: test-fargate
Description: The name of the parent fargate networking stack
ServiceName:
Type: String
Default: nginx
Description: Name of the ECS service
ImageUrl:
Type: String
Default: nginx
Description: The url of a docker image that contains the application process that
will handle the traffic for this service
ContainerPort:
Type: Number
Default: 80
Description: What port number the application inside the docker container is binding to
ContainerCpu:
Type: Number
Default: 256
Description: How much CPU to give the container. 1024 is 1 CPU
ContainerMemory:
Type: Number
Default: 512
Description: How much memory in megabytes to give the container
Path:
Type: String
Default: "*"
Description: A path on the public load balancer that this service
should be connected to. Use * to send all load balancer
traffic to this service.
Priority:
Type: Number
Default: 1
Description: The priority for the routing rule added to the load balancer.
This only applies if your have multiple services which have been
assigned to different paths on the load balancer.
DesiredCount:
Type: Number
Default: 2
Description: How many copies of the service task to run
Role:
Type: String
Default: ""
Description: (Optional) An IAM role to give the service's containers if the code within needs to
access other AWS resources like S3 buckets, DynamoDB tables, etc
Conditions:
HasCustomRole: !Not [!Equals [!Ref 'Role', '']]
# Task Definition
Resources:
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn:
Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'ECSTaskExecutionRole']]
TaskRoleArn:
Fn::If:
- 'HasCustomRole'
- !Ref 'Role'
- !Ref "AWS::NoValue"
ContainerDefinitions:
- Name: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
Image: !Ref 'ImageUrl'
PortMappings:
- ContainerPort: !Ref 'ContainerPort'
# ALB Target Group
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 6
HealthCheckPath: /
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
TargetType: ip
Name: !Ref 'ServiceName'
Port: !Ref 'ContainerPort'
Protocol: HTTP
UnhealthyThresholdCount: 2
VpcId:
Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'VPCId']]
# ALB Rule
LoadBalancerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
Actions:
- TargetGroupArn: !Ref 'TargetGroup'
Type: 'forward'
Conditions:
- Field: path-pattern
Values: [!Ref 'Path']
ListenerArn:
Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'PublicListener']]
Priority: !Ref 'Priority'
# ECS or Fargate Service
Service:
Type: AWS::ECS::Service
DependsOn: LoadBalancerRule
Properties:
ServiceName: !Ref 'ServiceName'
Cluster:
Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'ClusterName']]
LaunchType: FARGATE
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 75
DesiredCount: !Ref 'DesiredCount'
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'FargateContainerSecurityGroup']]
Subnets:
- Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'PublicSubnetOne']]
- Fn::ImportValue:
!Join [':', [!Ref 'StackName', 'PublicSubnetTwo']]
TaskDefinition: !Ref TaskDefinition
LoadBalancers:
- ContainerName: !Ref 'ServiceName'
ContainerPort: !Ref 'ContainerPort'
TargetGroupArn: !Ref 'TargetGroup'
Based on the comments.
The issue was that the first and second templates were being deployed into one stack. The solution was to deploy the second template as a separate stack.
I am using the airflow helm chart to run airflow on k8s. However, the web pod can't seem to connect to postgresql. The odd thing is, that other pods can.
I've cobbled together small scripts to check, and this is what I found:
[root#ip-10-56-173-248 bin]# cat checkpostgres.sh
docker exec -u root $1 /bin/nc -zvw2 airflow-postgresql 5432
[root#ip-10-56-173-248 bin]# docker ps --format '{{.Names}}\t{{.ID}}'|grep k8s_airflow|grep default|awk '{printf("%s ",$1); system("checkpostgres.sh " $2)}'
k8s_airflow-web_airflow-web-57c6dcd77b-dvjmv_default_67d74586-284b-11ea-8021-0249931777ef_74 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) : Connection timed out
k8s_airflow-worker_airflow-worker-0_default_67e1703a-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-scheduler_airflow-scheduler-5d9b688ccf-zdjdl_default_67d3fab4-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-postgresql_airflow-postgresql-76c954bb7f-gwq68_default_67d1cf3d-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-redis_airflow-redis-master-0_default_67d9aa36-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (?) open
k8s_airflow-flower_airflow-flower-79c999764d-d4q58_default_67d267e2-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
And this is my k8s version info:
➜ ~ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
When I do a nslookup on the pod name, it seems to work fine:
# nslookup airflow-postgresql
Server: 172.20.0.10
Address: 172.20.0.10#53
Non-authoritative answer:
Name: airflow-postgresql.default.svc.cluster.local
Address: 172.20.166.209
EDIT: As requested, here is the EKS setup:
amazon-eks-nodegroup.yaml:
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS - Node Group'
Parameters:
KeyName:
Description: The EC2 Key Pair to allow SSH access to the instances
Type: AWS::EC2::KeyPair::KeyName
NodeImageId:
Type: AWS::EC2::Image::Id
Description: AMI id for the node instances.
NodeInstanceType:
Description: EC2 instance type for the node instances
Type: String
Default: t3.medium
AllowedValues:
- t2.small
- t2.medium
- t2.large
- t2.xlarge
- t2.2xlarge
- t3.nano
- t3.micro
- t3.small
- t3.medium
- t3.large
- t3.xlarge
- t3.2xlarge
- m3.medium
- m3.large
- m3.xlarge
- m3.2xlarge
- m4.large
- m4.xlarge
- m4.2xlarge
- m4.4xlarge
- m4.10xlarge
- m5.large
- m5.xlarge
- m5.2xlarge
- m5.4xlarge
- m5.12xlarge
- m5.24xlarge
- c4.large
- c4.xlarge
- c4.2xlarge
- c4.4xlarge
- c4.8xlarge
- c5.large
- c5.xlarge
- c5.2xlarge
- c5.4xlarge
- c5.9xlarge
- c5.18xlarge
- i3.large
- i3.xlarge
- i3.2xlarge
- i3.4xlarge
- i3.8xlarge
- i3.16xlarge
- r3.xlarge
- r3.2xlarge
- r3.4xlarge
- r3.8xlarge
- r4.large
- r4.xlarge
- r4.2xlarge
- r4.4xlarge
- r4.8xlarge
- r4.16xlarge
- x1.16xlarge
- x1.32xlarge
- p2.xlarge
- p2.8xlarge
- p2.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
- r5.large
- r5.xlarge
- r5.2xlarge
- r5.4xlarge
- r5.12xlarge
- r5.24xlarge
- r5d.large
- r5d.xlarge
- r5d.2xlarge
- r5d.4xlarge
- r5d.12xlarge
- r5d.24xlarge
- z1d.large
- z1d.xlarge
- z1d.2xlarge
- z1d.3xlarge
- z1d.6xlarge
- z1d.12xlarge
ConstraintDescription: Must be a valid EC2 instance type
NodeAutoScalingGroupMinSize:
Type: Number
Description: Minimum size of Node Group ASG.
Default: 1
NodeAutoScalingGroupMaxSize:
Type: Number
Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
Default: 4
NodeAutoScalingGroupDesiredCapacity:
Type: Number
Description: Desired capacity of Node Group ASG.
Default: 3
NodeVolumeSize:
Type: Number
Description: Node volume size
Default: 20
ClusterName:
Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster. i.e. "eks"
Type: String
Environment:
Description: the Environment value provided when the cluster was created. i.e. "dev"
Type: String
BootstrapArguments:
Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
Default: ""
Type: String
VpcId:
Description: The VPC of the worker instances stack reference
Type: String
Subnets:
Description: The subnets where workers can be created.
Type: String
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
-
Label:
default: "EKS Cluster"
Parameters:
- ClusterName
-
Label:
default: "dev"
Parameters:
- Environment
-
Label:
default: "Worker Node Configuration"
Parameters:
- NodeAutoScalingGroupMinSize
- NodeAutoScalingGroupDesiredCapacity
- NodeAutoScalingGroupMaxSize
- NodeInstanceType
- NodeImageId
- NodeVolumeSize
- KeyName
- BootstrapArguments
-
Label:
default: "Worker Network Configuration"
Parameters:
- VpcId
- Subnets
Resources:
NodeInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
InstanceProfileName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-profile"
Path: "/"
Roles:
- !Ref NodeInstanceRole
NodeInstanceRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-role"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
- arn:aws:iam::aws:policy/AmazonS3FullAccess
- arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
Policies:
-
PolicyName: "change-r53-recordsets"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action: route53:ChangeResourceRecordSets
Resource: !Sub
- "arn:aws:route53:::hostedzone/${ZoneId}"
- {ZoneId: !ImportValue DNS-AccountZoneID}
-
PolicyName: "list-r53-resources"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action:
- route53:ListHostedZones
- route53:ListResourceRecordSets
Resource: "*"
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for all nodes in the cluster
GroupName: !Sub "${ClusterName}-${Environment}-cluster-security-group"
VpcId:
Fn::ImportValue:
!Sub ${VpcId}-vpcid
Tags:
- Key: !Sub "kubernetes.io/cluster/${ClusterName}-${Environment}-cluster"
Value: 'owned'
NodeSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: '-1'
FromPort: 0
ToPort: 65535
NodeSecurityGroupFromControlPlaneIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroup:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet and pods
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
IpProtocol: tcp
FromPort: 443
ToPort: 443
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ClusterControlPlaneSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
ToPort: 443
FromPort: 443
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
LaunchConfigurationName: !Ref NodeLaunchConfig
MinSize: !Ref NodeAutoScalingGroupMinSize
MaxSize: !Ref NodeAutoScalingGroupMaxSize
VPCZoneIdentifier:
- Fn::Select:
- 0
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
- Fn::Select:
- 1
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
- Fn::Select:
- 2
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
Tags:
- Key: Name
Value: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}-${Environment}-cluster'
Value: 'owned'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: '1'
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
PauseTime: 'PT5M'
NodeLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
LaunchConfigurationName: !Sub "${ClusterName}-${Environment}-cluster-node-launch-config"
AssociatePublicIpAddress: 'true'
IamInstanceProfile: !Ref NodeInstanceProfile
ImageId: !Ref NodeImageId
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroups:
- !Ref NodeSecurityGroup
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
VolumeSize: !Ref NodeVolumeSize
VolumeType: gp2
DeleteOnTermination: true
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${BootstrapArguments} ${ClusterName}-${Environment}-cluster
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo start amazon-ssm-agent
sudo sysctl -w vm.max_map_count=262144
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt NodeInstanceRole.Arn
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-nodegroup-rolearn"
NodeSecurityGroup:
Description: The security group for the node group
Value: !Ref NodeSecurityGroup
amazon-eks-cluster.yaml:
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS - Cluster'
Parameters:
VPCStack:
Type: String
Description: VPC Stack Name
ClusterName:
Type: String
Description: EKS Cluster Name (i.e. "eks")
Environment:
Type: String
Description: Environment for this Cluster (i.e. "dev") which will be appended to the ClusterName (i.e. "eks-dev")
Resources:
ClusterRole:
Description: Allows EKS to manage clusters on your behalf.
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${ClusterName}-${Environment}-cluster-role"
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
Effect: Allow
Principal:
Service:
- eks.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
Policies:
-
PolicyName: "change-r53-recordsets"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action: route53:ChangeResourceRecordSets
Resource: !Sub
- "arn:aws:route53:::hostedzone/${ZoneId}"
- {ZoneId: !ImportValue DNS-AccountZoneID}
-
PolicyName: "list-r53-resources"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action:
- route53:ListHostedZones
- route53:ListResourceRecordSets
Resource: "*"
ClusterControlPlaneSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "${ClusterName}-${Environment}-cluster-control-plane-sg"
GroupDescription: Cluster communication with worker nodes
VpcId:
Fn::ImportValue:
!Sub "${VPCStack}-vpcid"
Cluster:
Type: "AWS::EKS::Cluster"
Properties:
Version: "1.14"
Name: !Sub "${ClusterName}-${Environment}-cluster"
RoleArn: !GetAtt ClusterRole.Arn
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref ClusterControlPlaneSecurityGroup
SubnetIds:
- Fn::Select:
- 0
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
- Fn::Select:
- 1
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
- Fn::Select:
- 2
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
Route53Cname:
Type: "AWS::Route53::RecordSet"
Properties:
HostedZoneId: !ImportValue DNS-AccountZoneID
Comment: CNAME for Control Plane Endpoint
Name: !Sub
- "k8s.${Environment}.${Zone}"
- { Zone: !ImportValue Main-zone-name}
Type: CNAME
TTL: '900'
ResourceRecords:
- !GetAtt Cluster.Endpoint
Outputs:
ClusterName:
Value: !Ref Cluster
Description: Cluster Name
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterName"
ClusterArn:
Value: !GetAtt Cluster.Arn
Description: Cluster Arn
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterArn"
ClusterEndpoint:
Value: !GetAtt Cluster.Endpoint
Description: Cluster Endpoint
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterEndpoint"
ClusterControlPlaneSecurityGroup:
Value: !Ref ClusterControlPlaneSecurityGroup
Description: ClusterControlPlaneSecurityGroup
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
cluster-parameters.json
[
{
"ParameterKey": "VPCStack",
"ParameterValue": "Main"
},
{
"ParameterKey": "ClusterName",
"ParameterValue": "amundsen-eks"
},
{
"ParameterKey": "Environment",
"ParameterValue": "dev"
}
]
nodegroup-parameters.json:
[
{
"ParameterKey": "KeyName",
"ParameterValue": "data-warehouse-dev"
},
{
"ParameterKey": "NodeImageId",
"ParameterValue": "ami-08739803f18dcc019"
},
{
"ParameterKey": "NodeInstanceType",
"ParameterValue": "r5.2xlarge"
},
{
"ParameterKey": "NodeAutoScalingGroupMinSize",
"ParameterValue": "1"
},
{
"ParameterKey": "NodeAutoScalingGroupMaxSize",
"ParameterValue": "3"
},
{
"ParameterKey": "NodeAutoScalingGroupDesiredCapacity",
"ParameterValue": "2"
},
{
"ParameterKey": "NodeVolumeSize",
"ParameterValue": "20"
},
{
"ParameterKey": "ClusterName",
"ParameterValue": "amundsen-eks"
},
{
"ParameterKey": "Environment",
"ParameterValue": "dev"
},
{
"ParameterKey": "BootstrapArguments",
"ParameterValue": ""
},
{
"ParameterKey": "VpcId",
"ParameterValue": "Main"
},
{
"ParameterKey": "Subnets",
"ParameterValue": "Main-privatesubnets"
}
]
And the creation scripts:
cluster:
aws cloudformation create-stack \
--stack-name amundsen-eks-cluster \
--parameters file://./cluster-parameters.json \
--template-body file://../../../../templates/cloud-formation/eks/amazon-eks-cluster.yaml \
--capabilities CAPABILITY_NAMED_IAM --profile myprofile
nodegroup:
aws cloudformation create-stack \
--stack-name amundsen-eks-cluster-nodegroup \
--parameters file://./nodegroup-parameters.json \
--template-body file://../../../../templates/cloud-formation/eks/amazon-eks-nodegroup.yaml \
--capabilities CAPABILITY_NAMED_IAM --profile myprofile
What would cause this behavior\what else could I check, to narrow this down?
I'm recently learning ECS from AWS documents from Module Two - Deploy the Monolith | AWS.
While I read the YAML file for the CloudFormation, the file creates two EC2 instances in the cluster and also specified two public subnets in the VPC. I'm new to the VPC, so is it because of the creation of 2 EC2 instances so two public subnets are needed?
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
DesiredCapacity:
Type: Number
Default: '2'
Description: Number of instances to launch in your ECS cluster.
MaxSize:
Type: Number
Default: '2'
Description: Maximum number of instances that can be launched in your ECS cluster.
InstanceType:
Description: EC2 instance type
Type: String
Default: t2.micro
AllowedValues: [t2.micro, t2.small, t2.medium, t2.large, m3.medium, m3.large,
m3.xlarge, m3.2xlarge, m4.large, m4.xlarge, m4.2xlarge, m4.4xlarge, m4.10xlarge,
c4.large, c4.xlarge, c4.2xlarge, c4.4xlarge, c4.8xlarge, c3.large, c3.xlarge,
c3.2xlarge, c3.4xlarge, c3.8xlarge, r3.large, r3.xlarge, r3.2xlarge, r3.4xlarge,
r3.8xlarge, i2.xlarge, i2.2xlarge, i2.4xlarge, i2.8xlarge]
ConstraintDescription: Please choose a valid instance type.
Mappings:
AWSRegionToAMI:
us-east-1:
AMIID: ami-eca289fb
us-east-2:
AMIID: ami-446f3521
us-west-1:
AMIID: ami-9fadf8ff
us-west-2:
AMIID: ami-7abc111a
eu-west-1:
AMIID: ami-a1491ad2
eu-central-1:
AMIID: ami-54f5303b
ap-northeast-1:
AMIID: ami-9cd57ffd
ap-southeast-1:
AMIID: ami-a900a3ca
ap-southeast-2:
AMIID: ami-5781be34
SubnetConfig:
VPC:
CIDR: '10.0.0.0/16'
PublicOne:
CIDR: '10.0.0.0/24'
PublicTwo:
CIDR: '10.0.1.0/24'
Resources:
# VPC into which stack instances will be placed
VPC:
Type: AWS::EC2::VPC
Properties:
EnableDnsSupport: true
EnableDnsHostnames: true
CidrBlock: !FindInMap ['SubnetConfig', 'VPC', 'CIDR']
PublicSubnetOne:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PublicOne', 'CIDR']
MapPublicIpOnLaunch: true
PublicSubnetTwo:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PublicTwo', 'CIDR']
MapPublicIpOnLaunch: true
InternetGateway:
Type: AWS::EC2::InternetGateway
GatewayAttachement:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref 'VPC'
InternetGatewayId: !Ref 'InternetGateway'
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref 'VPC'
PublicRoute:
Type: AWS::EC2::Route
DependsOn: GatewayAttachement
Properties:
RouteTableId: !Ref 'PublicRouteTable'
DestinationCidrBlock: '0.0.0.0/0'
GatewayId: !Ref 'InternetGateway'
PublicSubnetOneRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetOne
RouteTableId: !Ref PublicRouteTable
PublicSubnetTwoRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetTwo
RouteTableId: !Ref PublicRouteTable
# ECS Resources
ECSCluster:
Type: AWS::ECS::Cluster
EcsSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ECS Security Group
VpcId: !Ref 'VPC'
EcsSecurityGroupHTTPinbound:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref 'EcsSecurityGroup'
IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
EcsSecurityGroupSSHinbound:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref 'EcsSecurityGroup'
IpProtocol: tcp
FromPort: '22'
ToPort: '22'
CidrIp: 0.0.0.0/0
EcsSecurityGroupALBports:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref 'EcsSecurityGroup'
IpProtocol: tcp
FromPort: '31000'
ToPort: '61000'
SourceSecurityGroupId: !Ref 'EcsSecurityGroup'
CloudwatchLogsGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Join ['-', [ECSLogGroup, !Ref 'AWS::StackName']]
RetentionInDays: 14
ECSALB:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: demo
Scheme: internet-facing
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '30'
Subnets:
- !Ref PublicSubnetOne
- !Ref PublicSubnetTwo
SecurityGroups: [!Ref 'EcsSecurityGroup']
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- !Ref PublicSubnetOne
- !Ref PublicSubnetTwo
LaunchConfigurationName: !Ref 'ContainerInstances'
MinSize: '1'
MaxSize: !Ref 'MaxSize'
DesiredCapacity: !Ref 'DesiredCapacity'
CreationPolicy:
ResourceSignal:
Timeout: PT15M
UpdatePolicy:
AutoScalingReplacingUpdate:
WillReplace: 'true'
ContainerInstances:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId: !FindInMap [AWSRegionToAMI, !Ref 'AWS::Region', AMIID]
SecurityGroups: [!Ref 'EcsSecurityGroup']
InstanceType: !Ref 'InstanceType'
IamInstanceProfile: !Ref 'EC2InstanceProfile'
UserData:
Fn::Base64: !Sub |
#!/bin/bash -xe
echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
yum install -y aws-cfn-bootstrap
/opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource ECSAutoScalingGroup --region ${AWS::Region}
ECSServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: ecs-service
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'elasticloadbalancing:DeregisterInstancesFromLoadBalancer'
- 'elasticloadbalancing:DeregisterTargets'
- 'elasticloadbalancing:Describe*'
- 'elasticloadbalancing:RegisterInstancesWithLoadBalancer'
- 'elasticloadbalancing:RegisterTargets'
- 'ec2:Describe*'
- 'ec2:AuthorizeSecurityGroupIngress'
Resource: '*'
EC2Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ec2.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: ecs-service
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecs:CreateCluster'
- 'ecs:DeregisterContainerInstance'
- 'ecs:DiscoverPollEndpoint'
- 'ecs:Poll'
- 'ecs:RegisterContainerInstance'
- 'ecs:StartTelemetrySession'
- 'ecs:Submit*'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
- 'ecr:GetAuthorizationToken'
- 'ecr:BatchGetImage'
- 'ecr:GetDownloadUrlForLayer'
Resource: '*'
AutoscalingRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [application-autoscaling.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: service-autoscaling
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'application-autoscaling:*'
- 'cloudwatch:DescribeAlarms'
- 'cloudwatch:PutMetricAlarm'
- 'ecs:DescribeServices'
- 'ecs:UpdateService'
Resource: '*'
EC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles: [!Ref 'EC2Role']
Outputs:
ClusterName:
Description: The name of the ECS cluster, used by the deploy script
Value: !Ref 'ECSCluster'
Export:
Name: !Join [':', [!Ref "AWS::StackName", "ClusterName" ]]
Url:
Description: The url at which the application is available
Value: !Join ['', [!GetAtt 'ECSALB.DNSName']]
ALBArn:
Description: The ARN of the ALB, exported for later use in creating services
Value: !Ref 'ECSALB'
Export:
Name: !Join [':', [!Ref "AWS::StackName", "ALBArn" ]]
ECSRole:
Description: The ARN of the ECS role, exports for later use in creating services
Value: !GetAtt 'ECSServiceRole.Arn'
Export:
Name: !Join [':', [!Ref "AWS::StackName", "ECSRole" ]]
VPCId:
Description: The ID of the VPC that this stack is deployed in
Value: !Ref 'VPC'
Export:
Name: !Join [':', [!Ref "AWS::StackName", "VPCId" ]]
In your example, two AZs are being used which requires two subnets (one for each AZ). This is not related to the number of EC2 instances.
A typical best practices with AWS and other cloud vendors is to use multiple availability zones (AZ) for fault tolerance. For AWS each AZ needs its own subnet. Auto scaling and load balancing will attempt to keep the number of instances the same in each AZ.
PS. If I was learning AWS, I would not start with this example. This example is very complex but very realistic for a real world deployment. There are lots of cloudformation examples that are much easy to master to start with.