Connection Timeout Error on Amazon Glue Job - amazon-web-services

I am trying to run an AWS Glue Command from AWS CLI to get my job started from an EC2 instance. This is the command
aws glue start-job-run --job-name Connection_Test
But I get the following error:
ConnectTimeoutError: Connect timeout on endpoint URL: "https://glue.us-east-1.amazonaws.com/"
I have added a connection to the Glue Job but I still got the same answer. Do you know what could be? Thanks for your answer!

These errors:
ConnectTimeoutError: Connect timeout on endpoint URL"
occur when your environment (in this case, an EC2 instance) is not able to communicate with the AWS service in question (glue.us-east-1).
You have two options to fix this:
Give your EC2 instance internet connectivity, with either an internet gateway or a NAT gateway. If you do this, traffic to the service will go over the public internet
Create an interface VPC endpoint for the service and deploy it in the same subnet as the EC2 instance. If you do this, traffic to the service will go over AWS private network
The second option is generally the best one, because connectivity to the endpoint will be faster, and because it allows you to keep your instance on a private network.
(The two options are not mutually exclusive).

Related

ECS Fargate Task in EventBridge fails with ResourceInitializationError

I have created an ECS Fargate Task, which I can manually run. It updates a Dynomodb and I get logs.
Now I want this to run on a schedule. I have setup a scheduled ECS task through EventBridge. However, this does not run.
My looking at the EventBridge logs I can see that the container has been stopped for the following stopped reason:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource
retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3
time(s): RequestError: send request failed caused by: Post https://api.ecr....
I thought this might be a problem with permissions. However, I tested giving the Task Execution Role full power user permissions and I still get the same error. Could the problem be something else?
This is due to a connectivity issue.
The docs say the following:
For tasks on Fargate, in order for the task to pull the container image it must either use a public subnet and be assigned a public IP address or a private subnet that has a route to the internet or a NAT gateway that can route requests to the internet.
So you need to make sure your task has a route to an internet gateway (i.e. it's in a Public subnet) or a NAT gateway.
Alternatively, if your service is in an isolated subnet, you need to create VPC endpoints for ECR and other services you need to call, as described in the docs:
To allow your tasks to pull private images from Amazon ECR, you must create the interface VPC endpoints for Amazon ECR.
When you create a scheduled task, you also specify the networking options. The docs mention this step:
(Optional) Expand Configure network configuration to specify a network configuration. This is required for tasks hosted on Fargate and for tasks using the awsvpc network mode.
For Subnets, specify one or more subnet IDs.
For Security groups, specify one or more security group IDs.
For Auto-assign public IP, specify whether to assign a public IP address from your subnet to the task.
So the networking configuration changed between the manually run task and the scheduled task. Refer to the above to figure out the needed settings for your case.
I fixed this by enabling auto-assign public IP.
However, to do this, I had to first change from "Capacity provider strategy" -
"Use cluster default", to "Launch type" - "FARGATE". Then the option to enable auto-assign public IP became available in the dropdown in the EventBridge UI.
This seems odd to me, because my default capacity provider strategy for my cluster is Fargate. But it is working now.
Need to use a gateway to follow the traffic from ECS to ECR. It can either Internet Gateway or NAT Gateway eventually which would be effecting cost factor.
But where we can resolve this scenario, by creating VPC Endpoints. Which maintains the traffic within the AWS Resources.
Endpoints Required for this would be :
S3 Gateway
ECR
ECS

How can I access aws resources in VPC from AWS glue?

I have a glue job which is hitting an API hosted over an EC2 instance.
The problem is EC2 instance resides within a VPC blocking all public access.
I tried creating an endpoint interface in my VPC but still can't access the REST API.
The host is always unreachable but when I try to access the API from VPC it is working fine.
The security group associated with the EC2 instance is used while creating the VPC Endpoint.
Any help is appreciated
If you go to AWS Glue console, under connections, create a connection. What is meant by a dummy connection, is just be a non-existent database or resource for example: jdbc:mysql://some-fake-endpoint-here:3306/mydb. After this you choose the correct VPC, subnet and security group. Which means a test connection will not work in this context but what it brings is a way to introduce your VPC, Subnet and Security group information to the job. Testing such a connection can be done using a python-shell job or launch an ec2 instance in the same vpc or same subnet and run something like nc -vz endport port.
This connection metadata information will facilitate the launching of elastic network interfaces in your account that allow glue DPUs to communicate with your resource at runtime. More on how connections in glue is discussed here.

AWS Lambda Function Timeout on Connecting To RDS Database through RDS Proxy

I'm trying to test AWS RDS proxy so I created a lambda function and done all steps that are present in this official link
https://aws.amazon.com/blogs/compute/using-amazon-rds-proxy-with-aws-lambda
store RDS credentials in Secret Manager
create new role and also add Trust Policy
in lambda function, from the AWS console, add proxy and its status is available.
When I execute the lambda function, it times out with no errors it seems like the error might be on connecting to db with rds proxy because when I run the lambda function again without proxy, it works just fine.
I initially thought that it might be a security group issue, so I edit the security group of RDS Proxy and update inbound and allow 0.0.0.0 (outbound was already 0.0.0.0).
I used defaut VPC in RDS Database and RDS Proxy. The endpoint of RDS database is public.
Since RDS proxy is not available outside the VPC. Configure your lambda function to run inside the VPC. The following link will help:
https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html
Late answer.. thought these might help others.
You have to keep your lambdas inside the same VPC and subnets to access RDS proxy.
In any case if you want to access third party web api from your lambda, you have make the lambda subnets private (no Internet Gateway in route table) and assign a NAT gateway which is tied with a public subnet.
If you are accessing other AWS services which are out of VPC like S3, Secret Manager etc. then you have to create VPC endpoints for those services in your VPC.

How can AWS Glue access IP whitelisted resource

If I have a service that needs to have IP whitelisting, how can I connect AWS Glue to it? I read that I seem to be able to put AWS Glue in a private VPC and configure a NAT gateway. I can then allow my NAT IP to connect to the service. However, I cannot find anyway to configure my Glue Job to run inside a subnet/VPC. How do I do this?
The job will run automatically in a VPC if you attach a Database connection to a resource which is inside the VPC. For example, I have a job that reads data from S3 and writes into an Aurora database in a private VPC using a Glue connection (configured as JDBC).
That job automatically has access to all the resources inside the VPC, as explained here. If the VPC has enabled NAT for external access, then your job can also take advantage of that.
Note if you use a connection that requires VPC and you use S3, you will need to enable an endpoint for S3 in that VPC as well.
The answer for your question is answered here -- https://stackoverflow.com/a/64414639 Note that Glue is a 'managed' service so it does not release any list IP addresses such that can be whitelisted. As a workaround you can use a EC2 instance to run your custom python OR pyspark script and whitelist the IP address of that particular EC2 instance

Connect to ElastiCache cluster from AWS Lambda function

Is it possible to connect from an AWS Lambda function to a Redis ElastiCache cluster?
I can't figure out if it's a configuration problem or it's simply not possible.
PS: I made a test from an EC2 instance and I can connect to the Redis node. Also the Lambda function and the Redis node are in the same region.
UPDATE (09 Oct 2015):
Amazon announced VPC for AWS Lambda functions. Details here
This means we can now access any resource in AWS behind VPC security group, including ElastiCache and RDS machines.
UPDATE (11 Feb 2016):
Amazon launched VPC for AWS Lambda.
https://aws.amazon.com/about-aws/whats-new/2016/02/access-resources-within-a-vpc-using-aws-lambda/
As of Feb 2016, AWS allows using lambda functions to connect to Elasticache. Refer to Access Resources within a VPC using AWS Lambda. Here is a link how it works - Tutorial: Configuring a Lambda Function to Access Amazon ElastiCache in an Amazon VPC
Setting up an HTTP Proxy or iptables wouldn't work for the following reasons:
Redis calls are not HTTP and will not be handled by HTTP proxies. iptables (or any port forwarding for that matter) will either won't accept a domain name as destination or is highly inefficient due to DNS resolution required every time.
The best and convenient method is to install twemproxy in an EC2 machine and route your requests through it. As a bonus, you suddenly have deployed a fantastic sharding strategy as well.
I have tried connecting lambda to memcached elasticache and it works fine. Redis should also be doable.
Couple of things to keep in mind:
Lambda and Elasticache has to be in the same VPC.
When lambda is run in VPC, it won't have access to internet (so access to public APIs won't work). NATGateway is required for this.
I was experiencing the same issue. I did not find a direct solution but instead used a Lambda function to connect to an EC2 server using socket.io which was pretty easy and emit an event to that EC2 server.
When the EC2 server received the event it performed the necessary Redis task ( database cleanup after image thumbnail generation ).
Hope this helps! If anyone finds out how to connect to ElastiCache from Lambda directly I'd still love to know!