How can an S3 event trigger a Lambda Function in a VPC? - amazon-web-services

I have one query. I tried to google it but could not find the answer specific to my query.
S3 is a global service. We can access it via the internet or using the VPC endpoint from our private network. That I understand.
If lambda functions are present inside VPC. Then how does s3 event trigger lambda functions?

You have to differentiate between the Lambda Service, a Lambda Function, and an Execution Context.
The Lambda service operates the Lambda functions, and an Execution Context is an instance of a Lambda Function. Only the Execution Context is located in the VPC. The rest of the components reside outside of it. The Lambda service can always communicate with the Execution Contexts of any particular Lambda Function to pass events to it and monitor the execution. It does that through a private channel and not through the VPC.
S3 is also not really a global service. The buckets and APIs reside in specific regions. It has a global namespace, meaning that bucket names have to be globally unique. This means some APIs will do "global checks", but when S3 acts, it acts inside of a region.
Let's talk through what happens in the S3-Lambda integration. When an event happens in a bucket (e.g. an object is created), the S3 service checks, which endpoints are interested in this event. If you want to send an event to a Lambda function, it has to be in the same region as the bucket. S3 will then contact the Lambda service and tell it to invoke the Lambda function with this specific event. S3 doesn't care about the results here.
This is where Lambda takes over. The service checks if S3 is permitted to invoke the function in question. If that's the case, it will check for existing Execution Contexts for that function that aren't busy. Once it finds
one, it sends the event to the Execution Context, which is executed inside the VPC and can access resources in the VPC.
Assuming everything goes well, this is how it ends, otherwise, Lambda will retry the event in another Execution Context.
References
AWS Docs: Using AWS Lambda with Amazon S3
AWS Docs: Asynchronous Lambda Invocation

Related

Invoking Lambda from VPC with CodePipeline Fails with Timeout

I have a Lambda that I have created following the example given by the aws docs (https://docs.aws.amazon.com/codepipeline/latest/userguide/actions-invoke-lambda-function.html), but I am invoking my Lambda from within a VPC and it seems that the CodePipeline never successfully talks to the Lambda (it gets a timeout and never seems to enter the Lambda as CloudWatch has none of my console.logs); this is despite the fact that I have created a CodePipeline Endpoint from within the VPC and associated the private subnet from which I launch the Lambda out to codepipeline.
I can give the Lambda an API Gateway endpoint and fire it manually just fine from Postman; it takes ~1 second to run. My Cloudwatch logs just have "Task timed out after 20.02 seconds." I'm not sure what else I can try; what else might prevent CodePipeline from talking to the Lambda?
After additional logging, I discovered that I actually had the VPC set up correctly and that the Lambda was being invoked; the Lambda was failing to get to S3 and was hanging on getting objects. I created another Endpoint for S3 for the VPC and was able to move passed the initial issue.

AWS application consistent snapshots of EC2 instances

I'm currently setting up a small Lambda to take snapshots of all the important volumes of our EC2 instances. To guarantee application consistency I need to trigger actions inside the instances: One to quiesce the application before the snapshot and one to wake it up again after the snapshot is done. So far I have no clue how to do this.
I've thought about using SNS or SQS to notify the instances about start and stop of the snapshot, but that has several problems:
I'll need to install (and develop) a custom listener inside the instances.
I'll not get feedback if the quiescing/wake-up is done.
So here's my question: How can I trigger an action inside an instance from an Lambda?
But maybe I'm approaching this from the wrong direction. Is there really no simple backup solution? I know azure has a snapshot based backup service that can do application consitent backups. Did I just miss an equivalent AWS service?
Edit 1:
Ok, it looks like the feature 'Run Command' of AWS Systems Manager is what I really need. It allows me to run scripts, Ansible playbooks and more inside an EC2 instance. When I've got a working solution I'll post the necessary steps.
You can trigger a Lambda function on demand:
Using AWS Lambda with Amazon API Gateway (On-Demand Over HTTPS)
You can invoke AWS Lambda functions over HTTPS. You can do this by
defining a custom REST API and endpoint using Amazon API Gateway, and
then mapping individual methods, such as GET and PUT, to specific
Lambda functions. Alternatively, you could add a special method named
ANY to map all supported methods (GET, POST, PATCH, DELETE) to your
Lambda function. When you send an HTTPS request to the API endpoint,
the Amazon API Gateway service invokes the corresponding Lambda
function. For more information about the ANY method, see Step 3:
Create a Simple Microservice using Lambda and API Gateway.

Trigger RDS lambda on CloudFront access

I'm serving static JS files over from my S3 Bucket over CloudFront and I want to monitor whoever accesses them, and I don't want it to be done over CloudWatch and such, I want to log it on my own.
For every request to the CloudFront I'd like to trigger a lambda function that inserts data about the request to my MySQL RDS instance.
However, CloudFront limits Viewer Request Viewer Response triggers too much, such as 1-second timeout (which is too little to connect to MySQL), no VPC configuration to the lambda (therefore I can't even access the RDS subnet) and such.
What is the most optimal way to achieve that? Setup an API Gateway and how would I send a request to there?
The typical method to process static content (or any content) accessed from CloudFront is to enable logging and then process the log files.
To enable CloudFront Edge events, which can include processing and changing an event, look into Lambda#Edge.
Lambda#Edge
I would enable logging first and monitor the traffic for a while. When the bad actors hit your web site (CloudFront Distribution) they will generate massive traffic. This could result in some sizable bills using Lambda Edge. I would also recommend looking in Amazon WAF to help mitigate Denial of Service attacks which may help with the amount of Lambda processing.
This seems like a suboptimal strategy, since CloudFront suspends request/response processing while the trigger code is running -- the Lambda code in a Lambda#Edge trigger has to finish executing before processing of the request or response continues, hence the short timeouts.
CloudFront provides logs that are dropped multiple times per hour (depending on the traffic load) into a bucket you select, which you can capture from an S3 event notification, parse, and insert into your database.
However...
If you really need real-time capture, your best bet might be to create a second Lambda function, inside your VPC, that accepts the data structures provided to the Lambda#Edge trigger.
Then, inside the code for the viewer request or viewer response trigger, all you need to do is use the built-in AWS SDK to invoke your second Lambda function asynchronously, passing the event to it.
That way, the logging task is handed off, you don't wait for a response, and the CloudFront processing can continue.
I would suggest that if you really want to take this route, this will be the best alternative. One Lambda function can easily invoke a second one, even if the second function is not in the same account, region, or VPC, because the invocation is done by communicating with the Lambda service's endpoint API.
But, there's still room for some optimization, because you have to take another aspect of Lambda#Edge into account, and it's indirectly related to this:
no VPC configuration to the lambda
There's an important reason for this. Your Lambda#Edge trigger code is run in the region closest to the edge location that is handling traffic for each specific viewer. Your Lambda#Edge function is provisioned in us-east-1, but it's then replicated to all the regions, ready to run if CloudFront needs it.
So, when you are calling that 2nd Lambda function mentioned above, you'll actually be reaching out to the Lambda API in the 2nd function's region -- from whichever region is handling the Lambda#Edge trigger for this particular request.
This means the delay will be more, the further apart the two regions are.
This your truly optimal solution (for performance purposes) is slightly more complex: instead of the L#E function invoking the 2nd Lambda function asynchronously, by making a request to the Lambda API... you can create one SNS topic in each region, and subscribe the 2nd Lambda function to each of them. (SNS can invoke Lambda functions across regional boundaries.) Then, your Lambda#Edge trigger code simply publishes a message to the SNS topic in its own region, which will immediately return a response and asynchronously invoke the remote Lambda function (the 2nd function, which is in your VPC in one specific region). Within your Lambda#Edge code, the environment variable process.env.AWS_REGION gives you the region where you are currently running, so you can use this to identify how to send the message to the correct SNS topic, with minimal latency. (When testing, this is always us-east-1).
Yes, it's a bit convoluted, but it seems like the way to accomplish what you are trying to do without imposing substantial latency on request processing -- Lambda#Edge hands off the information as quickly as possible to another service that will assume responsibility for actually generating the log message in the database.
Lambda and relational databases pose a serious challenge around concurrency, connections and connection pooling. See this Lambda databases guide for more information.
I recommend using Lambda#Edge to talk to a service built for higher concurrency as the first step of recording access. For example you could have your Lambda#Edge function write access records to SQS, and then have a background worker read from SQS to RDS.
Here's an example of Lambda#Edge interacting with STS to read some config. It could easily be refactored to write to SQS.

Automated cross-account DB backup using AWS Lambda,RDS,SNS

I have a Lambda function that shares an RDS manual snapshot with another AWS account.
I am trying to create a 'chain reaction' where the lambda is executed in the 1st account , then the snapshot is visible to the 2nd account, and another lambda is triggered that copies the visible snapshot in another region (inside the 2nd account) .
I tried using RDS event subscriptions and SNS topics, but I noticed that there is no RDS event subscription for sharing and/or modifying a RDS snapshot.
Then, I tried to setup cross-account permissions so the lambda from the first account will publish to an SNS topic which will trigger the lambda in the second account, but it seems that the topic and the target lambda must be in the same region (however the code that copies the db snapshot must be in the target region) . I followed this guide and I end up with this error:
A client error (InvalidParameter) occurred when calling the Subscribe operation: Invalid parameter: TopicArn
Has anyone tried something like this?
Is cross-region communication eventually feasible?
Could I trigger something in one region from something in another (any AWS service is welcome)?
My next attempts will be:
cross-region lambda invocation
Make use of API Gateway

How AWS Lambda function throw SubnetIPAddressLimitReachedException?

In one of my project, one of the AWS Lambda function (usually called every minute) invoking another AWS Lambda function inside its function ( using AWSLambdaClient lambdaClient;). sometimes lambdaClient on invocation of lambda function (its not frequent say 4 to 5 time in every hour) throwing SubnetIPAddressLimitReachedException :
2016-11-24 14 <---------------------> INFO xyzHandler:395 - Lambda was not able to set up VPC access for the Lambda function because one or more configured subnets has no available IP addresses. (Service: AWSLambda; Status Code: 502; Error Code: SubnetIPAddressLimitReachedException; Request ID: XXXX)
I searched here and here , but I didn't find any clear explaination of this exception ?
When your Lambda function is configured to execute inside your VPC, you specify one or more subnet IDs in which the Lambda function will execute.
The subnets that you specify needs to have enough free IP addresses inside them to accomodate all of the simultaneous executions of your Lambda function.
For example, if you choose one subnet and it is defined as a /24, then you have at most 254 or so IP addresses.
If your Lambda function(s) are called 300 times simultaneously, they're going to need 300 individual IP addresses, which your subnet cannot accomodate. In this case, you will get the SubnetIPAddressLimitReachedException error.
When Lambda functions complete, their resources will be reused. So they will free up the used IP addresses and/or re-use them during subsequent Lambda executions.
There is currently no way to limit the number of simultaneous executions within Lambda itself. I've seen people use other services, such as Kinesis, to limit it.
There are 3 avenues of resolution:
If your Lambda function does not need to execute within your VPC, and/or access resources from within your VPC, move it out of the VPC.
Specify more or different subnet IDs with more available IP addresses.
Modify your Lambda function to not call other Lambda functions. The root Lambda function and the subsequently called Lambda functions will each require an IP address.
Accessing Resources in a VPC
You can set this up when you create a new function. You can also update an existing function so that it has VPC access. You can configure this feature from the Lambda Console or from the CLI. Here’s how you set it up from the Console:
That’s all you need to do! Be sure to read Configuring a Lambda Function to Access Resources in an Amazon VPC in the Lambda documentation if you have any questions.
Resource link:
Access Resources in a VPC from Your Lambda Functions
Accessing the Internet and other AWS Resources in your VPC from AWS
Lambda