I am trying to get my VPC-connected Fargate instances to call the AWS Secrets Manager API, but in doing so the call is timing out:
Connect to secretsmanager.us-east-2.amazonaws.com:443
[secretsmanager.us-east-2.amazonaws.com/172.31.65.102,
secretsmanager.us-east-2.amazonaws.com/172.31.66.72,
secretsmanager.us-east-2.amazonaws.com/172.31.64.251] failed: connect
timed out
I am aware that as of earlier this year in Fargate 1.3.0 you can get the secrets injected in as environment variables as documented here. In fact, I have that type of integration working great!
My issue is that I am unable to fetch the exact same secret programmatically using the Secret Manager SDK. When I do, I get the above timeout. In addition to the appropriate policy on the IAM ecsTaskExecutionRole role (which is what enabled me to get the secret via env variable), I also added a VPC Endpoint (because my Fargate instances are on VPC) as documented here. My Fargate instances are regularly talking to the outside internet as well.
Any ideas on what else could cause the timeout?
Update: the problem was ultimately some unwise route entries. Thanks to the comments for reminding me that the error was a timeout and thus upstream from any IAM configuration issues. That helped me focus exclusively on network-related solutions.
Related
The other day, I received the following alert in GuardDuty.
Behavior:EC2/NetworkPortUnusual
port:80
Target:3.5.154.156
The EC2 that was the target of the alert was not being used for anything in particular. (However, it had been started up.)
There was no communication using port 80 until now.
Also, the IPAddress of the Target seems to be AWS S3.
The only recent change is that I recently deleted the EC2 InstanceProfile.
Therefore, there is currently no InstanceProfile attached to anything.
Do you know why this EC2 suddenly tried to use port 80 to communicate with the S3 page?
I looked at CloudTrail, etc., and found nothing suspicious.
(If there are any other items I should check, please let me know.)
Thankyou.
We have experienced similar alerts and after tedious debugging we found that SSM agent is responsible for this kind of GuardDuty findings.
SSM Agent communications with AWS managed S3 buckets
"In the course of performing various Systems Manager operations, AWS Systems Manager Agent (SSM Agent) accesses a number of Amazon Simple Storage Service (Amazon S3) buckets. These S3 buckets are publicly accessible, and by default, SSM Agent connects to them using HTTP calls."
I suggest to review CloudTrail logs and look for "UpdateInstanceInformation" event (this is how we found it eventually)
first time asker.
So I've been trying to implement AWS Cloud Watch to monitor Disk Usage on an EC2 instance running EC2 Linux. I'm interesting in doing this just using the CW Agent and I've installed it according to the how-to found here. The install runs fine and I've made sure I've created an IAM Role for the instance as is described here. Unfortunately whenever I run the amazon-cloudwatch-agent.service it only sends log files and not the custom used_percent measurement specified. I receive this error when I tail the logs.
2021-06-18T15:41:37Z E! WriteToCloudWatch failure, err: RequestError: send request failed
caused by: Post "https://monitoring.us-west-2.amazonaws.com/": dial tcp 172.17.1.25:443: i/o timeout
I've done my best googlefu but gotten nowhere thus far. If you've got any advice it would be appreciated.
Thank you
Belated answer to my own question. I had to create a security group that would accept traffic from that same security group!
Having the same issue, it definitely wasn't a network restriction as I was still able to telnet to the monitoring endpoint.
From AWS docs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-iam-roles-for-cloudwatch-agent.html
One role or user enables CloudWatch agent to be installed on a server
and send metrics to CloudWatch. The other role or user is needed to
store your CloudWatch agent configuration in Systems Manager Parameter
Store. Parameter Store enables multiple servers to use one CloudWatch
agent configuration.
If you're using the default cloudwatchagent configuration wizard, you may require extra policy CloudWatchAgentAdminRole in your role for the agent to connect to the monitoring service.
My use case
I have an AWS lambda hosted function that calls an external API. In my case it is Trello's terrific and well-defined API.
My problem in a nutshell - TL;DR Option: Feel Free to Jump to Statement Below
I had my external API call to Trello working properly. Now it is not working. I suspect I changed networking permissions within AWS that now block the returned response from the service provider. Details to follow.
My testing
I have tested my call to the API using Postman, so I know I have a well-formed request and a useful returned response from the service provider. The business logic is OK. For reference, here is the API call I am using. I have obfuscated my key and token for obvious reasons:
https://api.trello.com/1/cards?key=<myKey>&token=<myToken&idList=<a_real_list_here>&name=New+cards+are+cool
This should put a new card on my Trello board, and in POSTMAN (running on my local machine) it does so successfully. In fact, I had this working in an AWS lambda function I recently deployed. Here is the call. (Note that I'm using the recommended urllib3 library recommended by AWS:
http.request("POST", "https://api.trello.com/1/cards?key=<myKey>&token=<myToken>&idList=<a_real_list_here>&name="+card_name+"&desc="+card_description)
Furthermore, I have tested the same capability a CURL version of that same request. It is formed like this:
curl --location --request POST 'https://api.trello.com/1/cards?key=338d5b193d43e95712005fd2bcb4cd12&token=d0e3c4cd6281f43e4ec257ae5f05cd902230cbbca7e26b99664cd620f6479f7a&idList=600213811e171376755c7ed5&name=New+cards+are+cool'
I can summarize the behavior like this
+------------+---------------+----------------------+---------------+
| | Local Machine | Previously on Lambda | Now on Lambda |
+------------+---------------+----------------------+---------------+
| cURL | GOOD | GOOD | N/A |
+------------+---------------+----------------------+---------------+
| HTTP POST | GOOD | GOOD | 443 Error |
+------------+---------------+----------------------+---------------+
Code and Errors
I am not getting a debuggable response. I get a 443, which I presume is the error code, but even that is not clear. Here is the code snippet:
#send to trello board
try:
http.request("<post string from above>")
except:
logger.debug("<post string from above>")
The code never seems to get to the logger.debug() call. I get this in the AWS
log:
[DEBUG] 2021-01-19T21:56:24.757Z 729be341-d2f7-4dc3-9491-42bc3c5d6ebf
Starting new HTTPS connection (1): api.trello.com:443
I presume the "Starting New HTTPS connection..." log entry is coming fromurllib3 libraries
PROBLEM SUMMARY
I know from testing that my actual API call to the external service is properly formed. At one point it was working well, but now it is not. Previously, in order to get it to work well, I had to fiddle with AWS permissions to allow the response to come back from the service provider. I did it, but I didn't fully understand what I did and I think I was just lucky. Now it's broken and I want to do it in a thoughtful way.
What I'm looking for is an understanding of how to set up the AWS permission structure to enable that return message from the service provider. AWS provides a comprehensive guide to how to use the API Gateway to give others access to services hosted on AWS, but it's much more sketchy about how to open permissions for responses from other service providers.
Thanks to the folks at Hava, I have this terrific diagram of the permissions in place for my AWS infrastructure:
Security Structure The two nets marked in red are unrelated to this project. The first green check points to one of my EC2 machines and the second points to a related security group.
I'm hoping the community can help me to understand what the key permission elements (IAM roles, security groups, etc) are in play and what I need to look for in the related AWS permissions/networking/security structure.
As the lambda is in your VPC you need to make extra configurations to allow it to communicate beyond the VPC as the lambda runner does not have a public IP. Thus you'll need an internet or NAT gateway as described here: https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/
You'll need either additional managed services or infrastructure running a NAT gateway.
So, the problem in the end was none of the networking problems. In fact, the problem was the lambda function did not have the right Execution Role assigned.
SPECIFICALLY
Lambda needs AWSLambdaVPCAccessExecutionRole in order to call all of the basic VPC stuff to get to all the fancy networking infrastructure gymnastics shown above.
This is an AWS managed role and the default AWS description of this role is Allows Lambda functions to call AWS services on your behalf.
If you are having this problem, here is how to check this out.
Go to your lambda function [Services][Lambda][Functions] and then
click on your function
Go to the configuration tab. At the right side of the window,
select Edit.
If you were like me, you already had a Role but it may have been
the wrong one. If you change the role, the console will take a
while to reset the settings even before you hit Save this is
normal.
At the very bottom of the page, right below the role selection,
you'll see a link to the role in the IAM control panel. Click on
that to check your IAM Policies
Make sure that AWSLambdaVPCAccessExecutionRole is among the
polcies enabled.
Red Herrings
Here are two things that initially led me astray:
I keep seeing 443 come back as what I thought was an error code from the urllib3 service call. It was not. I tried a few other things and my best guess is that it was a port number, not an error.
The lack of access was certainly a networking configuration error, until I tried an experiment that proved to me that it was not. Here is the proposed experiment:
If you follow all of the guidance you will have the following network setup:
One public subnet connected to the internet gateway
One private subnet connected all of your internal organs
One NAT gateway that points your private subnet to the IGW
A routing table that connects your private subnet to the NAT gateway
A routing table that connects your public subnet to the IGW
THEN, with all of that set up, create a throw-away EC2 instance in your private subnet. When you set it up, it should not have a public IP. You can double check that by trying to use the CONNECT function on the EC2 pages. It should not work.
If you don't already have it, set up an EC2 in your public subnet. You should have a public IP for that one.
Now SSH into your public EC2. Once in there, SSH from your public EC2 to your private EC2. If all of your infrastructure is set up correctly, you should be able to do this. If you're logged into your private EC2, you should be able to ping a public web site from inside the EC2 running in that private subnet.
The fact that you could not directly connect to your private EC2 tells you that the subnet is secure-ish. The fact that you could reach the internet from that private EC2 tells you that the NAT gateway and routing tables are set up correctly.
But of course none of this matters if your Execution Role is not right.
One Last Thought
I'm absolutely no expert on this, but invite the experts to correct me here. I'll edit it as I learn. For those new to these concepts I deeply recommend taking the time to map out your network with a piece of paper. When I have enough credibility, I'll post the map I did to help me to think through all of this.
Paul
I've spun up an aurora serverless posgres-compatible database and I'm trying to connect to it from a lambda function, but I am getting AccessDenied errors:
AccessDeniedException:
Status code: 403, request id: 2b19fa38-af7d-4f4a-aaa5-7d068e92c901
Details:
I can connect to and query the database manually via the query editor if I use the same secret-arn and database name that the lambda is trying to use. I've triple-checked that the arns are correct
My lambdas are not in the vpc but are using the data api. The RDS cluster is in the default vpc
I've temporarily given my lambdas administrator access so that I know it's not a policy-based issue on the lambda side of things
Cloudwatch does not contain any additional details on the error
I am able to query the database from the command line of my personal computer (not on the vpc)
Any suggestions? Perhaps there is a way to get better details out of the error?
Aha! After trying to connect via the command line and being able to do so successfully, I realized this had to be something non-network related. Digging into my code a bit I eventually realized there wasn't anything wrong with the connection portions of the code, but rather with the user permissions being used to create the session/service that attempted to access the data. In hindsight I suppose the explicit AccessDenied (instead of a timeout) should have been a clue that I was able to reach the database just not able to do anything with it.
After digging in I discovered these two things are very different:
AmazonRDSFullAccess
AmazonRDSDataFullAccess
If you want to use the data api, you have to have the AmazonRDSDataFullAccess (or similar) policy. AmazonRDSFullAccess is not a superset of the AmazonRDSDataFullAccess permissions as one might assume. (If you look at the json for the AmazonRDSFullAccess policy you'll notice the permissions cover rds:* while the other policy covers rds-data:*, so apparently these are just different permissions spaces entirely)
TLDR: Use the AmazonRDSDataFullAccess policy (or similar) to access the data api. AmazonRDSFullAccess will not work.
I think you need to put your lambda in the same VPC as your serverless db. I did a quick test and able to connect to it from an EC2 in the same VPC.
ubuntu#ip-172-31-5-146:~$ telnet database-11.cluster-ckuv4ugsg77i.ap-northeast-1.rds.amazonaws.com 5432
Trying 172.31.14.180...
Connected to vpce-0403cfe830963dfe9-u0hmgbbx.vpce-svc-0445a873575e0c4b1.ap-northeast-1.vpce.amazonaws.com.
Escape character is '^]'.
^CConnection closed by foreign host.
This is my security group.
I would like to use aws sam to setup my serverless application. I have used it with dynamoDB before. This was very easy to since all I had to do was setup a dynamoDB table as a resource and then link it to the lambda functions. AWS SAM seams to know where the table is located. I was even able ot run the functions on my local machine using the sam-cli.
With RDS its a lot harder. The RDS Aurora Instance I am using sits behind a specific endpoint, in a specific subnet with security groups in my vpc protected by specific roles.
Now from what I understand, its aws sams job to use my template.yml to generate the roles and organize access rules for me.
But I don't think RDS is supported by aws sam by default, which means I would either be unable to test locally or need a vpn access to the aws vpc, which I am not a massive fan of, since it might be a real security risk.
I know RDS proxies exist, which can be created in aws sam, but they would also need vpc access, and so they just kick the problem down the road.
So how can I connect my aws sam project to RDS and if possible, execute the lambda functions on my machine?