Unable to create AWS EKS cluster with eksctl - amazon-web-services

Unable to create AWS EKS cluster with eksctl from Windows 10 PC. Here is the command which I'm executing
eksctl create cluster --name revit --version 1.17 --region ap-southeast-2 --fargate
Version of eksctl: 0.25.0
AWS CLI Version: aws-cli/2.0.38 Python/3.7.7 Windows/10 exe/AMD64
Error on executing create cluster command
2020-08-08T19:05:35+10:00 [ℹ] eksctl version 0.25.0
2020-08-08T19:05:35+10:00 [ℹ] using region ap-southeast-2
2020-08-08T19:05:35+10:00 [!] retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connectex: A socket operation was attempted to an unreachable network.) from ec2metadata/GetToken - will retry after delay of 54.121635ms
2020-08-08T19:05:35+10:00 [!] retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connectex: A socket operation was attempted to an unreachable network.) from ec2metadata/GetToken - will retry after delay of 86.006168ms

I had the same error, I've got rid of it providing my AWS credentials for programmatic access (AWS Access Key ID, AWS Secret Access Key):
$ aws configure
Next time I used eksctl it just didn't try to authenticate on its own and command passed.

I suspect this is related to this: https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/
Specifically:
Protecting against open layer 3 firewalls and NATs Last, there is a final layer of defense in IMDSv2 that is designed to protect EC2 instances that have been misconfigured as open routers, layer 3 firewalls, VPNs, tunnels, or NAT devices. With IMDSv2, the PUT response containing the secret token will, by default, not be able to travel outside the instance. This is accomplished by having the default Time To Live (TTL) on the low-level IP packets containing the secret token set to “1,” much lower than a typical value, such as “64.” Hardware and software that handle packets, including EC2 instances, subtract 1 from each packet’s TTL field whenever they pass it on. If the TTL gets to 0, the packet is discarded, and an error message is sent back to the sender. A packet with a TTL of “64” can therefore make sixty-four “hops” in a network before giving up, while a packet with a TTL of “1” can exist in just one. This feature allows legitimate traffic to get to an intended destination, but is designed to stop packets from endlessly running around in circles if there’s a loop in a network.
Are you by any chance running the command above from within a container launched in bridge mode? I had a similar problem. If that is the case you could run it using --network host or by passing the creds as system variables.

Related

AWS CLI ecs run-task CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref

I'm trying to move from the Console to the CLI.
I have an ECS Cluster and a Task Definition. From the console, I can run a task WITHOUT any issue. The task comes green and I can use the public IP to access my service.
Now, I'd like to do the same but instead of creating the task using the Console, I'd like to use AWS cli.
I thought this was enough:
aws ecs run-task --cluster my-cluster \
--task-definition ecs-task-def:9 \
--launch-type FARGATE \
--network-configuration '{ "awsvpcConfiguration": { "subnets": ["subnet-XX1","subnet-XX2"], "securityGroups": ["sg-XXX"],"assignPublicIp": "ENABLED" }}'
However, the task gets stuck in PENDING state and after a while is STOPPED with the following error message:
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
What concerns me is that I can run tasks from the Console using the same arguments (VPC, Subnets, Sec Group, etc) but I cannot make it work using the CLI.
If the issue was missing/wrong rules both Console and CLI should not work.
Anyone knows why?
Look like ECS cannot pull image from registry
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
suggested that network through 443 has been blocked!? hence cannot pull image. Have you tried allow all traffic inbound & outbound on attached sg as well as check network connectivity from within attached subnet?
You can create a simple Lambda function with similar associated subnets & security groups then executing telnet/curl to registry endpoint to check connectivity.
example:
def test_book():
http = urllib3.PoolManager()
url = 'https://your-endpoint-here'
headers = {
"Accept": "application/json"
}
r = http.request(method='GET', url=url, headers=headers)
print(f'response_status: {r.status}\nresonse_headers: {r.headers}\nresponse_data: {r.data}')

aws data migration service . start replication task issue

I have created AWS DMS replication instance, replication task and source, target enpoints using terrform.
Now, when i run start replication task from windows aws cli . it throws this SSL error.
Error running command 'aws dms start-replication-task --start-replication-task-type start-replication --replication-task-arn arn:aws:dms:us-west-2:accountnumber:task:xxxxxxx': exit status 254. Output: C:\Program Files\Amazon\AWSCLIV2\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'dms.us-east-1.amazonaws.com'. Adding certificate verification is strongly advised.
My CLI version is The version is aws-cli/2.1.6 Python/3.7.9 Windows/10 exe/AMD64 prompt/off.
There is no proxy configured
Any suggestion on this issue.
Thnaks
Make sure you set your credential correctly where you replication task is living by set [default] on the right one
./aws/credentials
to start the replication task you have to fill two mandatory field
{
"ReplicationTaskArn": "string",
"StartReplicationTaskType": "string"
}
StartReplicationTaskType --> Valid Values: start-replication | resume-processing | reload-target
start-replication only when you create the task so you may need to make it reload-target
Ref:
https://docs.aws.amazon.com/dms/latest/APIReference/API_StartReplicationTask.html

Pull images from Kubernetes running on AWS with ECR pulls images from the wrong region in other account

I have k8s clusters on AWS working with ECR and pulling images from all regions. This works fine.
But when I try to pull images from a different account they get "no such host". I followed these instructions to set iam permissions (and the docs). I'm not getting permission denied - I'm getting this:
Failed to pull image "<acc id>.dkr.ecr.ap-outheast-2.amazonaws.com/image:tag":
rpc error: code = Unknown desc = Error response from daemon:
Get https://<acc id>.dkr.ecr.ap-outheast-2.amazonaws.com/v1/_ping:
dial tcp: lookup <acc id>.dkr.ecr.ap-outheast-2.amazonaws.com
on 10.71.0.2:53: no such host
My cluster is running in ap-southeast-1 and the IP 10.71.0.2:53 is the default DNS AWS set for the VPC
I'm trying to wok around this by populating this region's ECR as well. But it seems pretty wrong.
Any idea how to allow ECR to pull from another region?
I think you made simple typo in .dkr.ecr.ap-outheast-2.amazonaws.com/image:tag - that's why you receive no such host from DNS server, just try to replace ap-outheast-2 with ap-southeast-2.
Generally if you set ECR IAM right that should work as ECR is accessible/routable as public service in Internet with limitations based on IAM.

Kubespray: send request failed caused by: Post https://ec2.us-east-1.amazonaws.com/

I'm trying to install Kubernetes with Kubespray using AWS a cloud provider. The installation fails with
FAILED - RETRYING: Master | wait for the apiserver to be running
When I check the logs of the kubelet docker container on the master I see
Flag --enable-cri has been deprecated, The non-CRI implementation will be deprecated and removed in a future version.
I0824 16:30:03.413509 13279 feature_gate.go:144] feature gates: map[Accelerators:true]
I0824 16:30:03.413727 13279 aws.go:762] Building AWS cloudprovider
I0824 16:30:03.413878 13279 aws.go:725] Zone not specified in configuration file; querying AWS metadata service
Error: failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0cb81504d85c14b90: error listing AWS instances: RequestError: send request failed
caused by: Post https://ec2.us-east-1.amazonaws.com/: dial tcp 54.239.28.168:443: i/o timeout
Error: failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0cb81504d85c14b90: error listing AWS instances: RequestError: send request failed
caused by: Post https://ec2.us-east-1.amazonaws.com/: dial tcp 54.239.28.168:443: i/o timeout
Flag --enable-cri has been deprecated, The non-CRI implementation will be deprecated and removed in a future version.
I0824 16:32:04.169558 13517 feature_gate.go:144] feature gates: map[Accelerators:true]
I0824 16:32:04.169808 13517 aws.go:762] Building AWS cloudprovider
I0824 16:32:04.169852 13517 aws.go:725] Zone not specified in configuration file; querying AWS metadata service
I'm positive this is a firewall issue. I have an IAM role with the proper permissions. When I set the https_proxy variable I am able to
curl https://ec2.us-east-1.amazonaws.com/
When the proxy variable is not set the curl fails. I tried setting the https_proxy variable inside the hyperkube container. However this causes a cert error when the apiserver tries to handshake with the etcd nodes.
Is there a way to get kubelet to only use the proxy when calling out to https://ec2.us-east-1.amazonaws.com/?

Spark Cluster on EC2 - "ssh-ready" state pops up for password

I am trying to create a Spark cluster on EC2 with the following command
(I am referring Apache documetnation)
./spark-ec2 --key-pair=spark-cluster --identity-file=/Users/abc/spark-cluster.pem --slaves=3 --region=us-west-1 --zone=us-west-1c --vpc-id=vpc-2e44594 --subnet-id=subnet-18447841 --spark-version=1.6.1 launch spark-cluster
Once I fire above command master and slaves are getting created but once process reaches to 'SSH-ready' state process keeps on waiting for password
below is the Trace. I have referred apache official documentation and many other documents/videos none of the creations asked for the password. not sure whether I am missing something, any pointer to this issue is much appreciated.
Creating security group spark-cluster-master Creating security group
spark-cluster-slaves Searching for existing cluster spark-cluster in
region us-west-1... Spark AMI: ami-1a250d3e Launching instances...
Launched 3 slaves in us-west-1c, regid = r-32249df4 Launched master in
us-west-1c, regid = r-5r426bar Waiting for AWS to propagate instance
metadata...
**
Waiting for cluster to enter 'ssh-ready' state..........Password:
**
Modified the spark-ec2.py script to include the proxy and enabled the AWS Nat to allow the outbound calls