Need your help in understanding some concepts. I have a web application that uses Lambda#Edge on the CloudFront. This lambda function accesses the DynamoDB - making around 10 independent queries. This generates occasional errors, though it works perfectly when I test the lambda function stand alone. I am not able to make much sense out of the cloudfront logs, and Lambda#Edge does not show up in the CloudWatch.
I have a feeling that the DynamoDB queries are the culprit. (because that is all I am doing in the Lambda function) To make sure, I replicated the data over all regions. But that has not solved the problem. I increased the timeout and memory allocated to the lambda function. But that has not helped in any way. But, reducing the number of DB queries seems to help.
Can you please help me understand this? Is it wrong to make DB queries in the Lambda#Edge? Is there a way to get detailed logs of the Lambda#Edge?
Over a year too late, but you never know someone benefits of it. Lambda#Edge does not run in a specific region, hence, if you connect to a DynamoDB table, you need to define the region in which this table can be found.
In NodeJS this would result in the below:
// Load the AWS SDK for Node.js
var AWS = require('aws-sdk');
// Set the region
AWS.config.update({region: 'REGION'});
// Create DynamoDB document client
var docClient = new AWS.DynamoDB.DocumentClient({apiVersion: '2012-08-10'});
As F_SO_K mentioned, you can find your CloudWatch logs in the region closest to you. How to find out which region that would be (in case you're the only one using that specific Lambda#Edge, you can have a look in this documentation)
Lambda#Edge logs show up in CloudWatch under the region in which the Lambda was called. I suspect you simply need to go into CloudWatch and change to the correct region to see the logs. If you are calling CloudWatch yourself, this will be the region you are in, not the region you created the Lambda.
Once you have the log you should have much more information to go on.
Related
I am looking for a way to monitor any changes that occur to my production envrionment. Such as security group changes, ec2 create/stop/deletes, database changes, s3 bucket changes, route table changes, subnet changes, etc... I was looking at using cloudtrail for this and monitoring all api calls. However, when testing, my subscribed SNS topic was not receiving any notifications when i was making some changes for a test. Curious if anyone else has a work around for this or if I am missing something? Maybe lambda? Just looking for the easiest way to receive email notifications when any changes are made within my prod environment. Thank you.
If you're looking to audit the entire event history of AWS API calls then you would use CloudTrail, remembering to create a trail and enabling the options if you want to audit S3 or Lambda API calls.
By itself CloudTrail will provide auditing, but it can be combined with CloudWatch/EventBridge to automate actions based on specific API calls such as triggering a Lambda or triggering an SNS topic.
Regarding your own implementation so far using SNS always ensure you've accepted the subscription first on the subscriber(s).
In addition you can use AWS Config with many resources in AWS providing 2 benefits to you. You will be able to maintain a history of changes to you resources, whilst also being able to configure compliance and resolution rules for your resources.
I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.
I need to start a Lambda Function when an object has been created on an S3 Bucket. I found 2 solutions to do this.
Using AWS::S3::Bucket NotificationConfiguration.
Using a CloudWatch AWS::Events::Rule.
They both seem to do exactly the same thing, which is to track specific changes and launch a Lambda Function when it happens. I could not find any information on which one should be used. I'm using Cloud Formation Template to provision the Lambda, the S3 Bucket and the trigger.
Which one should I use to call a Lambda on Object level changes and why?
Use the 1st one because of
A push model is much better than a pull model. Push means you send data when you get it instead of polling onto something for some set of interval. This is an era for push notifications all over us. You don't go to facebook to check every 5 minutes if someone has liked your picture or not OR someone has replied to your comment, etc.
In terms of cost and efforts also, S3 event notification wins the race.
Cloudwatch was the best option if you didn't have S3 notification but since you have it, that's the best. Plus if you have a feature in the service itself then why will you go for an alternative solution like Cloudwatch rules.
I'm serving static JS files over from my S3 Bucket over CloudFront and I want to monitor whoever accesses them, and I don't want it to be done over CloudWatch and such, I want to log it on my own.
For every request to the CloudFront I'd like to trigger a lambda function that inserts data about the request to my MySQL RDS instance.
However, CloudFront limits Viewer Request Viewer Response triggers too much, such as 1-second timeout (which is too little to connect to MySQL), no VPC configuration to the lambda (therefore I can't even access the RDS subnet) and such.
What is the most optimal way to achieve that? Setup an API Gateway and how would I send a request to there?
The typical method to process static content (or any content) accessed from CloudFront is to enable logging and then process the log files.
To enable CloudFront Edge events, which can include processing and changing an event, look into Lambda#Edge.
Lambda#Edge
I would enable logging first and monitor the traffic for a while. When the bad actors hit your web site (CloudFront Distribution) they will generate massive traffic. This could result in some sizable bills using Lambda Edge. I would also recommend looking in Amazon WAF to help mitigate Denial of Service attacks which may help with the amount of Lambda processing.
This seems like a suboptimal strategy, since CloudFront suspends request/response processing while the trigger code is running -- the Lambda code in a Lambda#Edge trigger has to finish executing before processing of the request or response continues, hence the short timeouts.
CloudFront provides logs that are dropped multiple times per hour (depending on the traffic load) into a bucket you select, which you can capture from an S3 event notification, parse, and insert into your database.
However...
If you really need real-time capture, your best bet might be to create a second Lambda function, inside your VPC, that accepts the data structures provided to the Lambda#Edge trigger.
Then, inside the code for the viewer request or viewer response trigger, all you need to do is use the built-in AWS SDK to invoke your second Lambda function asynchronously, passing the event to it.
That way, the logging task is handed off, you don't wait for a response, and the CloudFront processing can continue.
I would suggest that if you really want to take this route, this will be the best alternative. One Lambda function can easily invoke a second one, even if the second function is not in the same account, region, or VPC, because the invocation is done by communicating with the Lambda service's endpoint API.
But, there's still room for some optimization, because you have to take another aspect of Lambda#Edge into account, and it's indirectly related to this:
no VPC configuration to the lambda
There's an important reason for this. Your Lambda#Edge trigger code is run in the region closest to the edge location that is handling traffic for each specific viewer. Your Lambda#Edge function is provisioned in us-east-1, but it's then replicated to all the regions, ready to run if CloudFront needs it.
So, when you are calling that 2nd Lambda function mentioned above, you'll actually be reaching out to the Lambda API in the 2nd function's region -- from whichever region is handling the Lambda#Edge trigger for this particular request.
This means the delay will be more, the further apart the two regions are.
This your truly optimal solution (for performance purposes) is slightly more complex: instead of the L#E function invoking the 2nd Lambda function asynchronously, by making a request to the Lambda API... you can create one SNS topic in each region, and subscribe the 2nd Lambda function to each of them. (SNS can invoke Lambda functions across regional boundaries.) Then, your Lambda#Edge trigger code simply publishes a message to the SNS topic in its own region, which will immediately return a response and asynchronously invoke the remote Lambda function (the 2nd function, which is in your VPC in one specific region). Within your Lambda#Edge code, the environment variable process.env.AWS_REGION gives you the region where you are currently running, so you can use this to identify how to send the message to the correct SNS topic, with minimal latency. (When testing, this is always us-east-1).
Yes, it's a bit convoluted, but it seems like the way to accomplish what you are trying to do without imposing substantial latency on request processing -- Lambda#Edge hands off the information as quickly as possible to another service that will assume responsibility for actually generating the log message in the database.
Lambda and relational databases pose a serious challenge around concurrency, connections and connection pooling. See this Lambda databases guide for more information.
I recommend using Lambda#Edge to talk to a service built for higher concurrency as the first step of recording access. For example you could have your Lambda#Edge function write access records to SQS, and then have a background worker read from SQS to RDS.
Here's an example of Lambda#Edge interacting with STS to read some config. It could easily be refactored to write to SQS.
In our "standard" AWS account, I have a system that does something like this:
CloudWatch Rule (Scheduled Event) -> Lambda function (accesses DynamoDB table, makes computations, writes metrics) -> CloudWatch Alarm (consume metrics, etc.)
However, in our separate CN account, we need to do a similar thing, but in CN, there's no Lambda...
Is there any way we can do something similar to what was done above using the systems available to CN? For example, is it possible to create a rule and have it trigger a lambda function in our "standard/nonCN" AWS account that access the other account's DynamoDB table?
I ultimately accomplished this by having the Lambda and the CloudWatch alarm live in the non-CN account, and then having the Lambda access the dynamoDB table across accounts and across regions.
This actually ended up working, though it did involve me using user credentials instead of a role like I would have been able to had it not been CN.
If anyone is interested in more details on this solution, feel free to comment and I can add more.
You can mix and match between AWS resources between regions. When you do your code, you need to make sure you have the regions correctly configured to those resources.
With respect to trigger, Have the trigger where ever you have your lambda. That will ease your process.
Hope it helps.