I'm trying to create an ElasticLoadBalancer for a Kubernetes cluster running on EKS. I would like to avoid creating a new security group and instead use one that I specify. From the Kubernetes source code (below) it appears that I can accomplish this by setting c.cfg.Global.ElbSecurityGroup.
func (c *Cloud) buildELBSecurityGroupList(...) {
var securityGroupID string
if c.cfg.Global.ElbSecurityGroup != "" {
securityGroupID = c.cfg.Global.ElbSecurityGroup
} else {
...
}
...
}
How can I set the Kubernetes global config value ElbSecurityGroup?
Related and outdated question: Kubernetes and AWS: Set LoadBalancer to use predefined Security Group
As mentioned in kops documentation you can do it by editing kops cluster configuration:
cloudConfig
elbSecurityGroup
WARNING: this works only for Kubernetes version above 1.7.0.
To avoid creating a security group per elb, you can specify security group id, that will be assigned to your LoadBalancer. It must be security group id, not name. api.loadBalancer.additionalSecurityGroups must be empty, because Kubernetes will add rules per ports that are specified in service file. This can be useful to avoid AWS limits: 500 security groups per region and 50 rules per security group.
spec:
cloudConfig:
elbSecurityGroup: sg-123445678
Related
I am wondering if it is possible to configure the “public access source allowlist” from CDK. I can see and manage this in the console under the networking tab, but can’t find anything in the CDK docs about setting the allowlist during deploy. I tried creating and assigning a security group (code sample below), but this didn't work. Also the security group was created as an "additional" security group, rather than the "cluster" security group.
declare const vpc: ec2.Vpc;
declare const adminRole: iam.Role;
const securityGroup = new ec2.SecurityGroup(this, 'my-security-group', {
vpc,
allowAllOutbound: true,
description: 'Created in CDK',
securityGroupName: 'cluster-security-group'
});
securityGroup.addIngressRule(
ec2.Peer.ipv4('<vpn CIDR block>'),
ec2.Port.tcp(8888),
'allow frontend access from the VPN'
);
const cluster = new eks.Cluster(this, 'my-cluster', {
vpc,
clusterName: 'cluster-cdk',
version: eks.KubernetesVersion.V1_21,
mastersRole: adminRole,
defaultCapacity: 0,
securityGroup
});
Update: I attempted the following, and it updated the cluster security group, but I'm still able to access the frontend when I'm not on the VPN:
cluster.connections.allowFrom(
ec2.Peer.ipv4('<vpn CIDER block>'),
ec2.Port.tcp(8888)
);
Update 2: I tried this as well, and I can still access my application's frontend even when I'm not on the VPN. However I can now only use kubectl when I'm on the VPN, which is good! It's a step forward that I've at least improved the cluster's security in a useful manner.
const cluster = new eks.Cluster(this, 'my-cluster', {
vpc,
clusterName: 'cluster-cdk',
version: eks.KubernetesVersion.V1_21,
mastersRole: adminRole,
defaultCapacity: 0,
endpointAccess: eks.EndpointAccess.PUBLIC_AND_PRIVATE.onlyFrom('<vpn CIDER block>')
});
In general EKS has two relevant security groups:
The one used by nodes, which AWS calls "cluster security group". It's setup automatically by EKS. You shouldn't need to mess with it unless you want (a) more restrictive rules the defaults (b) open your nodes to maintenance taks (e.g.: ssh access). This is what you are acessing via cluster.connections.
The Ingress Load Balancer security group. This is an Application Load balancer created and managed by EKS. In CDK, it can be created like so:
const cluster = new eks.Cluster(this, 'HelloEKS', {
version: eks.KubernetesVersion.V1_22,
albController: {
version: eks.AlbControllerVersion.V2_4_1,
},
});
This will will serve as a gateway for all internal services that need an Ingress. You can access it via the cluster.albController propriety and add rules to it like a regular Application Load Balancer. I have no idea how EKS deals with task communication when an Ingress ALB is not present.
Relevant docs:
Amazon EKS security group considerations
Alb Controller on CDK docs
The ALB propriety for EKS Cluster objects
Background:
We're using AWS Cloud Development Kit (CDK) 2.5.0.
Manually using the AWS Console and hard-coded IP addresses, Route 53 to an ALB (Application Load Balancer) to a private Interface VPC Endpoint to a private REST API-Gateway (and so on..) works. See image below.
Code:
We're trying to code this manual solution via CDK, but are stuck on how to get and use the IP addresses or in some way hook up the load balancer to the Interface VPC Endpoint. (Endpoint has 3 IP addresses, one per availability zone in the region.)
The ALB needs a Target Group which targets the IP addresses of the Interface VPC Endpoint. (Using an "instance" approach instead of IP addresses, we tried using InstanceIdTarget with the endpoint's vpcEndpointId, but that failed. We got the error Instance ID 'vpce-WITHWHATEVERWASHERE' is not valid )
Using CDK, we created the following (among other things) using the aws_elasticloadbalancingv2 module:
ApplicationLoadBalancer (ALB)
ApplicationTargetGroup (ATG) aka Target Group
We were hopeful about aws_elasticloadbalancingv2_targets similar to aws_route53_targets, but no luck. We know the targets property of the ApplicationTargetGroup takes an array of IApplicationLoadBalancerTarget objects, but that's it.
:
import { aws_ec2 as ec2 } from 'aws-cdk-lib';
:
import { aws_elasticloadbalancingv2 as loadbalancing } from 'aws-cdk-lib';
// endpointSG, loadBalancerSG, vpc, ... are defined up higher
const endpoint = new ec2.InterfaceVpcEndpoint(this, `ABCEndpoint`, {
service: {
name: `com.amazonaws.us-east-1.execute-api`,
port: 443
},
vpc,
securityGroups: [endpointSG],
privateDnsEnabled: false,
subnets: { subnetGroupName: "Private" }
});
const loadBalancer = new loadbalancing.ApplicationLoadBalancer(this, 'abc-${config.LEVEL}-load-balancer', {
vpc: vpc,
vpcSubnets: { subnetGroupName: "Private" },
internetFacing: false,
securityGroup: loadBalancerSG
});
const listenerCertificate = loadbalancing.ListenerCertificate.fromArn(config.ARNS.CERTIFICATE)
const listener = loadBalancer.addListener('listener', {
port: 443,
certificates: [ listenerCertificate ]
});
let applicationTargetGroup = new loadbalancing.ApplicationTargetGroup(this, 'abc-${config.LEVEL}-target-group', {
port: 443,
vpc: vpc,
// targets: [ HELP ], - how to get the IApplicationLoadBalancerTarget objects?
})
listener.addTargetGroups( 'abc-listener-forward-to-target-groups', {
targetGroups: [applicationTargetGroup]
} );
As you can see above, we added a listener to the ALB. We added the Target Group to the listener.
Some of the resources we used:
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.ApplicationLoadBalancer.html
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.ApplicationTargetGroup.html
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.InterfaceVpcEndpoint.html
https://docs.aws.amazon.com/cdk/api/v2//docs/aws-cdk-lib.aws_elasticloadbalancingv2_targets.InstanceIdTarget.html and related.
How to get PrivateIPAddress of VPC Endpoint in CDK? but this did not help.
In case visualizing the set up via an image helps, here's a close approximation of what we're going for.
Any help populating that targets property of the ApplicationTargetGroup with IApplicationLoadBalancerTarget objects is appreciated. Thanks!
https://aws.amazon.com/blogs/networking-and-content-delivery/accessing-an-aws-api-gateway-via-static-ip-addresses-provided-by-aws-global-accelerator/
This blog shows how to configure the architecture given in the question using AWS console (just disable the global accelerator option). The key takeaway is that the application load balancer uses target type IP and resolves the VPC endpoint domain name manually in step 2. The other two options, instance (target is an EC2 instances) and lambda (target is an AWS Lambda function) cannot be used.
The ec2.InterfaceVpcEndpoint construct has no output which directly gives an IP address. The underlying CloudFormation resource also does not support it. Instead, you will have to use the vpcEndpointDnsEntries property of ec2.InterfaceVpcEndpoint and resolve the domain names to IP addresses in your code (the console configuration also required the same domain name resolution). You can use an IpTarget object in your ApplicationTargetGroup.
At this point, you will run into one final roadblock due to how CDK works under the hood.
If you have all your resources defined in one CDK application, the value for each parameter (or a reference to the value using an underlying CloudFormation functions like Ref, GetAtt, etc.) needs to be available before the synthesize step, since that's when all templates are generated. AWS CDK uses tokens for this purpose, which during synthesis resolve to values such as {'Fn::GetAtt': ['EndpointResourceLogicalName', 'DnsEntries']. However since we need the actual value of the DNS entry to be able to resolve it, the token value won't be useful.
One way to fix this issue is to have two completely independent CDK applications structured this way:
Application A with VPC and interface endpoint. Define the vpcEndpointDnsEntries and VPC-ID as outputs using CfnOutput.
Application B with the rest of the resources. You will have to write code to read outputs of the CloudFormation stack created by Application A. You can use Fn.importValue for VPC ID, but you cannot use it for the DnsEntries output since it would again just resolve to a Fn::ImportValue based token. You need to read the actual value of the stack output, using the AWS SDK or some other option. Once you have the domain name, you can resolve it in your typescript code (I am not very familiar with typescript, this might require a third party library).
Image credits:
https://docs.aws.amazon.com/cdk/v2/guide/apps.html
try this using a custome ressource to get eni ip addresses :
https://repost.aws/questions/QUjISNyk6aTA6jZgZQwKWf4Q/how-to-connect-a-load-balancer-and-an-interface-vpc-endpoint-together-using-cdk
I am trying to revoke an ingress rule on a security group that is inside my VPC which is not the default. I can find the security group using DescribeSecurityGroupsRequest and create the ingress rule using AuthorizeSecurityGroupIngressRequest all that works fine and I'm able to see the new rule in the AWS console, but when I try to revoke the same Ingress rule I am getting can not find the security group on default VPC but I don't see a way to specify which VPC. I'm using
RevokeSecurityGroupIngressRequest revokeSecurityGroupIngressRequest = new RevokeSecurityGroupIngressRequest();
revokeSecurityGroupIngressRequest.GroupId = "sg-id";
revokeSecurityGroupIngressRequest.GroupName = "sg-name";
revokeSecurityGroupIngressRequest.IpPermissions = ipPermissions;
I have seen how you would do this using the CLI or lambda using boto3 but I don't see how to do it using the .net SDK
John Rotenstein Had the right answer. The group name is used for default VPC. If you specify the group name it defaults to the default VPC. Omitting that and using the security group ID only works perfectly.
I am able to create an EKS cluster but when I try to add nodegroups, I receive a "Create failed" error with details:
"NodeCreationFailure": Instances failed to join the kubernetes cluster
I tried a variety of instance types and increasing larger volume sizes (60gb) w/o luck.
Looking at the EC2 instances, I only see the below problem. However, it is difficult to do anything since i'm not directly launching the EC2 instances (the EKS NodeGroup UI Wizard is doing that.)
How would one move forward given the failure happens even before I can jump into the ec2 machines and "fix" them?
Amazon Linux 2
Kernel 4.14.198-152.320.amzn2.x86_64 on an x86_64
ip-187-187-187-175 login: [ 54.474668] cloud-init[3182]: One of the
configured repositories failed (Unknown),
[ 54.475887] cloud-init[3182]: and yum doesn't have enough cached
data to continue. At this point the only
[ 54.478096] cloud-init[3182]: safe thing yum can do is fail. There
are a few ways to work "fix" this:
[ 54.480183] cloud-init[3182]: 1. Contact the upstream for the
repository and get them to fix the problem.
[ 54.483514] cloud-init[3182]: 2. Reconfigure the baseurl/etc. for
the repository, to point to a working
[ 54.485198] cloud-init[3182]: upstream. This is most often useful
if you are using a newer
[ 54.486906] cloud-init[3182]: distribution release than is
supported by the repository (and the
[ 54.488316] cloud-init[3182]: packages for the previous
distribution release still work).
[ 54.489660] cloud-init[3182]: 3. Run the command with the
repository temporarily disabled
[ 54.491045] cloud-init[3182]: yum --disablerepo= ...
[ 54.491285] cloud-init[3182]: 4. Disable the repository
permanently, so yum won't use it by default. Yum
[ 54.493407] cloud-init[3182]: will then just ignore the repository
until you permanently enable it
[ 54.495740] cloud-init[3182]: again or use --enablerepo for
temporary usage:
[ 54.495996] cloud-init[3182]: yum-config-manager --disable
Adding another reason to the list:
In my case the Nodes were running in a private subnets and I haven't configured a private endpoint under API server endpoint access.
After the update the nodes groups weren't updated automatically so I had to recreate them.
In my case, the problem was that I was deploying my node group in a private subnet, but this private subnet had no NAT gateway associated, hence no internet access. What I did was:
Create a NAT gateway
Create a new routetable with the following routes (the second one is the internet access route, through nat):
Destination: VPC-CIDR-block Target: local
Destination: 0.0.0.0/0 Target: NAT-gateway-id
Associate private subnet with the routetable created in the second-step.
After that, nodegroups joined the clusters without problem.
I noticed there was no answer here, but about 2k visits to this question over the last six months. There seems to be a number of reasons why you could be seeing these failures. To regurgitate the AWS documentation found here:
https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html
The aws-auth-cm.yaml file does not have the correct IAM role ARN for
your nodes. Ensure that the node IAM role ARN (not the instance
profile ARN) is specified in your aws-auth-cm.yaml file. For more
information, see Launching self-managed Amazon Linux nodes.
The ClusterName in your node AWS CloudFormation template does not
exactly match the name of the cluster you want your nodes to join.
Passing an incorrect value to this field results in an incorrect
configuration of the node's /var/lib/kubelet/kubeconfig file, and the
nodes will not join the cluster.
The node is not tagged as being owned by the cluster. Your nodes must
have the following tag applied to them, where is
replaced with the name of your cluster.
Key Value kubernetes.io/cluster/<cluster-name>
Value owned
The nodes may not be able to access the cluster using a public IP
address. Ensure that nodes deployed in public subnets are assigned a
public IP address. If not, you can associate an Elastic IP address to
a node after it's launched. For more information, see Associating an
Elastic IP address with a running instance or network interface. If
the public subnet is not set to automatically assign public IP
addresses to instances deployed to it, then we recommend enabling that
setting. For more information, see Modifying the public IPv4
addressing attribute for your subnet. If the node is deployed to a
private subnet, then the subnet must have a route to a NAT gateway
that has a public IP address assigned to it.
The STS endpoint for the Region that you're deploying the nodes to is
not enabled for your account. To enable the region, see Activating and
deactivating AWS STS in an AWS Region.
The worker node does not have a private DNS entry, resulting in the
kubelet log containing a node "" not found error. Ensure that the VPC
where the worker node is created has values set for domain-name and
domain-name-servers as Options in a DHCP options set. The default
values are domain-name:.compute.internal and
domain-name-servers:AmazonProvidedDNS. For more information, see DHCP
options sets in the Amazon VPC User Guide.
I myself had an issue with the tagging where I needed an uppercase letter. In reality, if you can use another avenue to deploy your EKS cluster I would recommend it (eksctl, aws cli, terraform even).
I will try to make the answer short by highlighting a few things that can go wrong in frontline.
1. Add the IAM role which is attached to EKS worker node, to the aws-auth config map in kube-system namespace. Ref
2. Login to the worker node which is created and failed to join the cluster. Try connecting to API server from inside using nc. Eg: nc -vz 9FCF4EA77D81408ED82517B9B7E60D52.yl4.eu-north-1.eks.amazonaws.com 443
3. If you are not using the EKS node from the drop down in AWS Console (which means you are using a LT or LC in the AWS EC2), dont forget to add the userdata section in the Launch template. Ref
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}
4. Check the EKS worker IAM node policy and see it has the appropriate permissions added. AmazonEKS_CNI_Policy is a must.
5. Your nodes must have the following tag applied to them, where cluster-name is replaced with the name of your cluster.
kubernetes.io/cluster/cluster-name: owned
I hope your problem lies within this list.
Ref: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html
https://aws.amazon.com/premiumsupport/knowledge-center/resolve-eks-node-failures/
Firstly, I had the NAT Gateway in my private subnet. Then I moved the NAT gateway back to public subnet which worked fine.
Terraform code is as follows:
resource "aws_internet_gateway" "gw" {
vpc_id = aws_vpc.dev-vpc.id
tags = {
Name = "dev-IG"
}
}
resource "aws_eip" "lb" {
depends_on = [aws_internet_gateway.gw]
vpc = true
}
resource "aws_nat_gateway" "natgw" {
allocation_id = aws_eip.lb.id
subnet_id = aws_subnet.dev-public-subnet.id
depends_on = [aws_internet_gateway.gw]
tags = {
Name = "gw NAT"
}
}
Try adding a tag to your private subnets where the worker nodes are deployed.
kubernetes.io/cluster/<cluster_name> = shared
we need to check what type of nat gateway we configured. It should be public one but in my case i configured as private.
Once i changed from private to public the issue resolved.
Auto Scaling group logs showed that we hit quote limit.
Launching a new EC2 instance. Status Reason: You've reached your quota for maximum Fleet Requests for this account. Launching EC2 instance failed.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/fleet-quotas.html
I had a similar issue and any provided solutions worked. After some investigation and running:
journalctl -f -u kubelet
in log I had:
Error: failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false
So naturally, the solution seems to disable swap with
swapoff -a
And then it worked fine, node has been registered and output was fine when checked with jounralctl and systemctl status kubelet .
The main problem is here network subnets they are public and private subnets. Check your Private subnets are added to NAT Gateway. If it is not added, add Private subnets to NAT Gateway and also check public subnets are attached to the Internet gateway.
I have service that exposes multiple ports and it worked fine with kubernetes but now we move it to AWS ECS. It seems I can only expose ports via Load Balancer and I am limited to 1 port per service/tasks even when docker defines multiple ports I have to choose one port
Add to load balancer button allows to add one port. Once added there is no button to add second port.
Is there any nicer workarround than making second proxy service to expose second port?
UPDATE: I use fargate based service.
You don't need any workaround, AWS ECS now supports multiple target groups within the same ECS service. This will be helpful for the use-cases where you wanted to expose multiple ports of the containers.
Currently, if you want to create a service specifying multiple target groups, you must create the service using the Amazon ECS API, SDK, AWS CLI, or an AWS CloudFormation template. After the service is created, you can view the service and the target groups registered to it with the AWS Management Console.
For example, A Jenkins container might expose port 8080 for the
Jenkins web interface and port 50000 for the API.
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/register-multiple-targetgroups.html
https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecs-services-now-support-multiple-load-balancer-target-groups/
Update:
I was able to configure target group using Terraform but did not find so far this option on AWS console.
resource "aws_ecs_service" "multiple_target_example" {
name = "multiple_target_example1"
cluster = "${aws_ecs_cluster.main.id}"
task_definition = "${aws_ecs_task_definition.with_lb_changes.arn}"
desired_count = 1
iam_role = "${aws_iam_role.ecs_service.name}"
load_balancer {
target_group_arn = "${aws_lb_target_group.target2.id}"
container_name = "ghost"
container_port = "3000"
}
load_balancer {
target_group_arn = "${aws_lb_target_group.target2.id}"
container_name = "ghost"
container_port = "3001"
}
depends_on = [
"aws_iam_role_policy.ecs_service",
]
}
Version note:
Multiple load_balancer configuration block support was added in Terraform AWS Provider version 2.22.0.
ecs_service_terraform
I can't say that this will be a nice workaround but I was working on a project where I need to run Ejabberd using AWS ECS but the same issue happened when its come to bind port of the service to the load balancer.
I was working with terraform and due to this limitation of AWS ECS, we agree to run one container per instance to resolve the port issue as we were supposed to expose two port.
If you do not want to assign a dynamic port to your container and you want to run one container per instance then the solution will definitely work.
Create a target group and specify the second port of the container.
Go to the AutoScalingGroups of your ECS cluster
Edit and add the newly created target group of in the Autoscaling group of the ECS cluster
So if you scale to two containers it's mean there will be two instances so the newly launch instance will register to the second target group and Autoscaling group take care of it.
This approach working fine in my case, but few things need to be consider.
Do not bind the primary port in target, better to bind primary port in
ALB service. The main advantage of this approach will be that if your
container failed to respond to AWS health check the container will be
restart automatically. As the target groupe health check will not recreate your container.
This approach will not work when there is dynamic port expose in Docker container.
AWS should update its ECS agent to handle such scenario.
I have faced this issue while creating more than one container per instances and second container was not coming up because it was using the same port defined in the taskdefinition.
What we did was, Created an Application Load balancer on top of these containers and removed hardcoded ports. What application load balancer does when it doesn't get predefined ports under it is, Use the functionality of dynamic port mapping. Containers will come up on random ports and reside in one target group and the load balancer will automatically send the request to these ports.
More details can be found here
Thanks to mohit's answer, I used AWS CLI to register multiple target groups (multiple ports) into one ECS service:
ecs-sample-service.json
{
"serviceName": "sample-service",
"taskDefinition": "sample-task",
"loadBalancers":[
{
"targetGroupArn":"arn:aws:elasticloadbalancing:us-west-2:0000000000:targetgroup/sample-target-group/00000000000000",
"containerName":"faktory",
"containerPort":7419
},
{
"targetGroupArn":"arn:aws:elasticloadbalancing:us-west-2:0000000000:targetgroup/sample-target-group-web/11111111111111",
"containerName":"faktory",
"containerPort":7420
}
],
"desiredCount": 1
}
aws ecs create-service --cluster sample-cluster --service-name sample-service --cli-input-json file://ecs-sample-service.json --network-configuration "awsvpcConfiguration={subnets=[subnet-0000000000000],securityGroups=[sg-00000000000000],assignPublicIp=ENABLED}" --launch-type FARGATE
If the task needs internet access to pull image, make sure subnet-0000000000000 has internet access.
Security group sg-00000000000000 needs to give relevant ports inbound access. (in this case, 7419 and 7420).
If the traffic only comes from ALB, the task does not need public IP. Then assignPublicIp can be false.
Usually, I use the AWS CLI method itself by creating task def, target groups and attaching them to the application load balancer. But the issue is that when there is multiple services to be done this is a time-consuming task so I would use terraform to create such services
terraform module link This is a multiport ECS service with Fargate deployment. currently, this supports only 2 ports. when using multiports with sockets this socket won't be sending any response so the health check might fail.
so to fix that I would override the port in the target group to other ports and fix that.
hope this helps