AWS Lambda connection timeout to Elasticache - amazon-web-services

I am trying to make Serverless work with Elasticache. I wrote a custom CloudFormation file based on serverless-examples/serverless-infrastructure repo. I managed to put Elasticache and Lambda in one subnet (checked with the cli). I retrieve the host and the port from the Outputs, but whenever I am trying to connect with node-redis the connection times out. Here are the relevant parts:
Resources
Serverless Config

I ran into this issue as well, but with Python. For me, there were a few problems that had to be ironed out
The lambda needs VPC permissions.
The ElastiCache security group needs an inbound rule from the Lambda security group that allows communication on the Redis port. I thought they could just be in the same security group.
And the real kicker: I had turned on encryption in-transit. This meant that I needed to pass redis.RedisClient(... ssl=True). The redis-py page mentions that ssl_cert_reqs needs to be set to None for use with ElastiCache, but that didn't seem to be true in my case. I did however need to pass ssl=True.
It makes sense that ssl=True needed to be set but the connection was just timing out so I went round and round trying to figure out what the problem with the permissions/VPC/SG setup was.

As Tolbahady pointed out, the only solution was to create an NAT within the VPC.

In my case I had TransitEncryptionEnabled: "true" with AuthToken: xxxxx for my redis cluster.
I ensured that both my lambda and redis cluster belonged to the same "private subnet".
I also ensured that my "securityGroup" allowed traffic to flow on the desired ports.
The major issue I faced was that my lambda was unable to fetch data from my redis cluster. And whenever it attempted to get the data it would throw timeout error.
I used Node.Js with "Node-Redis" client.
Setting option tls: true worked for me. This is a mandatory setting if you have encryption at transit enabled.
Here is my config:
import { createClient } from 'redis';
import config from "../config";
let options: any = {
url: `redis://${config.REDIS_HOST}:${config.REDIS_PORT}`,
password: config.REDIS_PASSWORD,
socket: { tls: true }
};
const redisClient = createClient(options);
Hope this answer is helpful to those who are using Node.Js with "Node-Redis" dependency in their lambda.

Related

Unable to connect to AWS/RDS from Lambda

I have a Node.js Express app that uses Sequelize to connect to the database. I want to deploy my app on Lambda (with API Gateway) and use an RDS Postgres database (serverless)
I created an RDS instance and a server-less setup. From an EC2 instance, I am able to connect to both the RDS instance and the server-less DB without any issues.
However, when I deploy the same code on Lambda, I am unable to connect to either DB instance. In fact, I do not see any error messages anywhere.
sequelize = new Sequelize(process.env.POSTGRES_DBNAME, process.env.POSTGRES_USERNAME, process.env.POSTGRES_PASSWORD, {
host: process.env.POSTGRES_HOST,
dialect: 'postgres',
logging: false,
operatorsAliases: false
});
// Test connection
(async function() {
try {
console.log('Connecting to: ', process.env.POSTGRES_DBNAME, process.env.POSTGRES_USERNAME, process.env.POSTGRES_PASSWORD, process.env.POSTGRES_HOST);
await sequelize.authenticate();
console.log('Connection has been established successfully.');
} catch (error) {
console.error('Unable to connect to the database:', error);
}
})();
I even tried using a MySQL instance with RDS proxy, but it's the same - The test connection part doesn't execute, and neither success nor error messages appear in the logs. I wanted to understand if I am missing something. The DB has been configured to be accessible from outside.
My guess is that you have not configured the Lambda IAM permissions correctly. In order for Lambda to be able to access RDS, you can use the AWSLambdaVPCAccessExecutionRole, for CloudWatch logs to work you can add the AWSLambdaBasicExecutionRole to you Lambda function.
The AWS Lambda developer guide has a tutorial that gives an example of how this can be done.
For more details, please read the Configuring a Lambda function to access resources in a VPC chapter in the developer guide.
To connect to an Amazon RDS instance from a Lambda function, refer to this Amazon document: How do I configure a Lambda function to connect to an RDS instance?.
The problem turned out to be with my express package. My AWS configuration was correct, and replacing the Lambda entry code with vanilla DB connection and printing a list of values worked, but plugging it with the Express code worked. I am not sure what the issue was - I found that upgrading the express version fixed my problem.
Thank you everyone for taking the time out to answer my question.
Always watch out for the VPC security group where your DB runs , if your RDS DB is not public you have to put your lambda function inside of the same security group and same subnets.
Here you can find more details Connect to a MySQL - AWS RDS from an AWS Lambda hosted Express JS App

How to properly set AWS inbound rules to accept response from external REST API call

My use case
I have an AWS lambda hosted function that calls an external API. In my case it is Trello's terrific and well-defined API.
My problem in a nutshell - TL;DR Option: Feel Free to Jump to Statement Below
I had my external API call to Trello working properly. Now it is not working. I suspect I changed networking permissions within AWS that now block the returned response from the service provider. Details to follow.
My testing
I have tested my call to the API using Postman, so I know I have a well-formed request and a useful returned response from the service provider. The business logic is OK. For reference, here is the API call I am using. I have obfuscated my key and token for obvious reasons:
https://api.trello.com/1/cards?key=<myKey>&token=<myToken&idList=<a_real_list_here>&name=New+cards+are+cool
This should put a new card on my Trello board, and in POSTMAN (running on my local machine) it does so successfully. In fact, I had this working in an AWS lambda function I recently deployed. Here is the call. (Note that I'm using the recommended urllib3 library recommended by AWS:
http.request("POST", "https://api.trello.com/1/cards?key=<myKey>&token=<myToken>&idList=<a_real_list_here>&name="+card_name+"&desc="+card_description)
Furthermore, I have tested the same capability a CURL version of that same request. It is formed like this:
curl --location --request POST 'https://api.trello.com/1/cards?key=338d5b193d43e95712005fd2bcb4cd12&token=d0e3c4cd6281f43e4ec257ae5f05cd902230cbbca7e26b99664cd620f6479f7a&idList=600213811e171376755c7ed5&name=New+cards+are+cool'
I can summarize the behavior like this
+------------+---------------+----------------------+---------------+
| | Local Machine | Previously on Lambda | Now on Lambda |
+------------+---------------+----------------------+---------------+
| cURL | GOOD | GOOD | N/A |
+------------+---------------+----------------------+---------------+
| HTTP POST | GOOD | GOOD | 443 Error |
+------------+---------------+----------------------+---------------+
Code and Errors
I am not getting a debuggable response. I get a 443, which I presume is the error code, but even that is not clear. Here is the code snippet:
#send to trello board
try:
http.request("<post string from above>")
except:
logger.debug("<post string from above>")
The code never seems to get to the logger.debug() call. I get this in the AWS
log:
[DEBUG] 2021-01-19T21:56:24.757Z 729be341-d2f7-4dc3-9491-42bc3c5d6ebf
Starting new HTTPS connection (1): api.trello.com:443
I presume the "Starting New HTTPS connection..." log entry is coming fromurllib3 libraries
PROBLEM SUMMARY
I know from testing that my actual API call to the external service is properly formed. At one point it was working well, but now it is not. Previously, in order to get it to work well, I had to fiddle with AWS permissions to allow the response to come back from the service provider. I did it, but I didn't fully understand what I did and I think I was just lucky. Now it's broken and I want to do it in a thoughtful way.
What I'm looking for is an understanding of how to set up the AWS permission structure to enable that return message from the service provider. AWS provides a comprehensive guide to how to use the API Gateway to give others access to services hosted on AWS, but it's much more sketchy about how to open permissions for responses from other service providers.
Thanks to the folks at Hava, I have this terrific diagram of the permissions in place for my AWS infrastructure:
Security Structure The two nets marked in red are unrelated to this project. The first green check points to one of my EC2 machines and the second points to a related security group.
I'm hoping the community can help me to understand what the key permission elements (IAM roles, security groups, etc) are in play and what I need to look for in the related AWS permissions/networking/security structure.
As the lambda is in your VPC you need to make extra configurations to allow it to communicate beyond the VPC as the lambda runner does not have a public IP. Thus you'll need an internet or NAT gateway as described here: https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/
You'll need either additional managed services or infrastructure running a NAT gateway.
So, the problem in the end was none of the networking problems. In fact, the problem was the lambda function did not have the right Execution Role assigned.
SPECIFICALLY
Lambda needs AWSLambdaVPCAccessExecutionRole in order to call all of the basic VPC stuff to get to all the fancy networking infrastructure gymnastics shown above.
This is an AWS managed role and the default AWS description of this role is Allows Lambda functions to call AWS services on your behalf.
If you are having this problem, here is how to check this out.
Go to your lambda function [Services][Lambda][Functions] and then
click on your function
Go to the configuration tab. At the right side of the window,
select Edit.
If you were like me, you already had a Role but it may have been
the wrong one. If you change the role, the console will take a
while to reset the settings even before you hit Save this is
normal.
At the very bottom of the page, right below the role selection,
you'll see a link to the role in the IAM control panel. Click on
that to check your IAM Policies
Make sure that AWSLambdaVPCAccessExecutionRole is among the
polcies enabled.
Red Herrings
Here are two things that initially led me astray:
I keep seeing 443 come back as what I thought was an error code from the urllib3 service call. It was not. I tried a few other things and my best guess is that it was a port number, not an error.
The lack of access was certainly a networking configuration error, until I tried an experiment that proved to me that it was not. Here is the proposed experiment:
If you follow all of the guidance you will have the following network setup:
One public subnet connected to the internet gateway
One private subnet connected all of your internal organs
One NAT gateway that points your private subnet to the IGW
A routing table that connects your private subnet to the NAT gateway
A routing table that connects your public subnet to the IGW
THEN, with all of that set up, create a throw-away EC2 instance in your private subnet. When you set it up, it should not have a public IP. You can double check that by trying to use the CONNECT function on the EC2 pages. It should not work.
If you don't already have it, set up an EC2 in your public subnet. You should have a public IP for that one.
Now SSH into your public EC2. Once in there, SSH from your public EC2 to your private EC2. If all of your infrastructure is set up correctly, you should be able to do this. If you're logged into your private EC2, you should be able to ping a public web site from inside the EC2 running in that private subnet.
The fact that you could not directly connect to your private EC2 tells you that the subnet is secure-ish. The fact that you could reach the internet from that private EC2 tells you that the NAT gateway and routing tables are set up correctly.
But of course none of this matters if your Execution Role is not right.
One Last Thought
I'm absolutely no expert on this, but invite the experts to correct me here. I'll edit it as I learn. For those new to these concepts I deeply recommend taking the time to map out your network with a piece of paper. When I have enough credibility, I'll post the map I did to help me to think through all of this.
Paul

Fargate calls to AWS Secrets Manager timing out

I am trying to get my VPC-connected Fargate instances to call the AWS Secrets Manager API, but in doing so the call is timing out:
Connect to secretsmanager.us-east-2.amazonaws.com:443
[secretsmanager.us-east-2.amazonaws.com/172.31.65.102,
secretsmanager.us-east-2.amazonaws.com/172.31.66.72,
secretsmanager.us-east-2.amazonaws.com/172.31.64.251] failed: connect
timed out
I am aware that as of earlier this year in Fargate 1.3.0 you can get the secrets injected in as environment variables as documented here. In fact, I have that type of integration working great!
My issue is that I am unable to fetch the exact same secret programmatically using the Secret Manager SDK. When I do, I get the above timeout. In addition to the appropriate policy on the IAM ecsTaskExecutionRole role (which is what enabled me to get the secret via env variable), I also added a VPC Endpoint (because my Fargate instances are on VPC) as documented here. My Fargate instances are regularly talking to the outside internet as well.
Any ideas on what else could cause the timeout?
Update: the problem was ultimately some unwise route entries. Thanks to the comments for reminding me that the error was a timeout and thus upstream from any IAM configuration issues. That helped me focus exclusively on network-related solutions.

Load data from S3 into Aurora Serverless using AWS Glue

According to Moving data from S3 -> RDS using AWS Glue
I found that an instance is required to add a connection to a data target. However, my RDS is a serverless, so there is no instance available. Does Glue support this case?
I have tried to connect Aurora MySql Serverless with AWS glue recently, and I failed. And I got a timeout error.
Check that your connection definition references your JDBC database with
correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago.
The driver has not received any packets from the server.
I think the reason was Aurora serverless doesn't have any continuously running instances so in the connection URL you cannot give any instances, and that's why Glue cannot connect.
So, you need to make sure that DB instance is running. Only then your JDBC connection works.
If your DB runs in a private VPC, you can follow this link:
Nat Creation
EDIT:
Instead of NAT GW, you can also use the VPC endpoint for S3.
Here is a really good blog that explains step by step.
Or AWS documentation
AWS Glue supports the scenario, i.e., it works well to load data from S3 into Aurora Serverless using an AWS Glue job. The engine version I'm currently using is 8.0.mysql_aurora.3.02.0
Note: if you get an error saying Data source rejected establishment of connection, message from server: "Too many connections", you can increase ACUs (currently mine is set to min 4 - max 8 ACUs for your reference), as the maximum number of connections depends on the capacity of ACUs.
I can use JDBC build connection,
There is one thing very important is you should have at least one subnet open ALL TCP port, but you can point the port to the subnet.
With the setting, connection test pass, crawler also can create tables.

AWS Elasticache Redis as SignalR Backplane

Has anybody tried to connect AWS Elasticache Redis (cluster mode disabled) to use with SignalR? I see there are some serious configuration issues and limitations with AWS Redis.
1) We are trying to use Redis as a backplane for signalr,
//GlobalHost.DependencyResolver.UseRedis("xxxxxx.0001.use1.cache.amazonaws.com:6379", 6379, "", "Performance");
It has to be as simple as this as per docs, I get socket failure on Ping when I try to connect. (I have seen posts about this with Windows azure, but could not find any help articles with AWS)
2) Should the cluster mode have to enabled ? as with cluster mode disabled, we need to use the replica end points for reading, and signalr does not know this ?
Thanks in advance.
We finally resolved, by removing the clusters and making a standalone AWS Redis.
The other issue we had it was assigned to the wrong security group, so we had changed it to the same as our EC2 instances.
You will still need to include ":6379" while accessing the DB.
However, if you are using dependency resolver for signalr you should not include ":6379" as the access point, but if you use the redis for read and write operations using StackExchange.Redis then you need to include ":6379" in the request.
This note (https://learn.microsoft.com/en-us/aspnet/signalr/overview/performance/scaleout-with-redis) says "SignalR scaleout with Redis does not support Redis clusters.".
Also, perhaps remove ":6379" from the server and only have 6379 in the port?