I am trying a scenario where cloud formation has to wait until an object is created in the specified bucket (where the object creation happens outside the scope of cloud formation by an external application).
I tried enabling bucket event notifications and hook a lambda function (so whenever an object is created in the bucket, lambda function is triggered). But I am not sure how to make cloud formation wait until this hooked lambda function execution is invoked.
Kindly let me if there any ideas on how to achieve this scenario.
I think the following should work:
Create WaitConditionHandle
Create a lambda function and pass !Ref to the wait condition handle created as an environment variable. When you !Ref a wait condition you get an url address. The lambda has only one job - to call the url when invoked.
Create WaitCondition and associate it with the wait handle created in step 1.
Add DependsOn attribute to the WaitCondition so that the condition gets created after the last resource to be created before CFN should pause and wait.
Use the S3 notification (as you already wrote in your question) to invoke lambda created in Step 2 when you get your object. Lambda gets invoked, calls the url, wait conditions stops waiting, and CFN should continue.
With the above there are no loops or long running processes, such as calling a lambda every 2 minutes.
Max timeout for the WaitCondition is 12 hours. You should adjust it 40 minutes or 1h for instance.
Try using a wait condition to solve this: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-waitcondition.html
You could try using Custom CloudFormation resources: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources-sns.html. This would require you can send a http request to a S3-url provided through an SNS notification.
You would create file based on the SNS-notification (using lambda?) and then send a request back to cloudformation.
Related
Here's what I know, or think I know.
In AWS Lambda, the first time you call a function is commonly called a "cold start" -- this is akin to starting up your program for the first time.
If you make a second function invocation relatively quickly after your first, this cold start won't happen again. This is colloquially known as a "warm start"
If a function is idle for long enough, the execution environment goes away, and the next request will need to cold start again.
It's also possible to have a single AWS Lambda function with multiple triggers. Here's an example of a single function that's handling both API Gateway requests and SQS messages.
My question: Will AWS Lambda reuse (warm start) an execution environment when different event triggers come in? Or will each event trigger have it's own cold start? Or is this behavior that's not guaranteed by Lambda?
Yes, different triggers will use the same containers since the execution environment is the same for different triggers, the only difference is the event that is passed to your Lambda.
You can verify this by executing your Lambda with two types of triggers (i.e. API Gateway and simply the Test function on the Lambda Console) and looking at the CloudWatch logs. Each Lambda container creates its own Log Stream inside of your Lambda's Log Group. You should see both event logs going to the same Log Stream which means the 2nd event is successfully using the warm container created by the first event.
Background:
I'm developing a custom AWS github-webhook via Terraform. I'm using AWS API Gateway to trigger an AWS Lambda function that validates the GitHub webhook's sha256 signature from the request header. If the lambda function successfully validates the request, I want a child lambda function to be invoked via the async invocation destination feature provided by Lambda.
Problem:
Even though I've configured the async invocation with the target child Lambda function, the child function is not triggered when the parent Lambda function is successful. This is reflected in the fact that the child Lambda function's associated CloudWatch log group is empty.
Relevant Code:
Here's the Terraform configuration for the Lambda function destination:
resource "aws_lambda_function_event_invoke_config" "lambda" {
function_name = module.github_webhook.function_name
destination_config {
on_success {
destination = module.lambda.function_arn
}
}
}
If more code from the module is needed, feel free to ask in the comments. The entire source code for this module is here: https://github.com/marshall7m/terraform-aws-codebuild/tree/master/modules/dynamic-github-source
Attempts:
Made sure both parent/child Lambda functions have permission to create logs within their respective Cloudwatch log group (attached arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole policy to both)
Made sure the parent Lambda function has the correct permission to invoke the child function: "lambda:InvokeFunction", "lambda:InvokeAsync"
Setup async invocation for child lambda function for both success and failure parent Lambda runs (child function still not triggered)
Add API integration request parameter `{'X-Amz-Invocation-Type': 'Event'} as mentioned in: https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-integration-async.html
For every attempt to fix this, I made sure to redeliver the request from the source (github webhook page) and not via the AWS Lambda console.
From your description it seems to me that you are invoking parent function synchronously. Lambda destinations are only for asynchronous invocations:
You can also configure Lambda to send an invocation record to another service. Lambda supports the following destinations for asynchronous invocation
So you have to execute your parent function asynchronously for your child function to be invoked.
Adding the API integration request parameter `{'X-Amz-Invocation-Type': 'Event'} as mentioned in: https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-integration-async.html did the trick. I initially came to the conclusion that this solution doesn't work based on the fact that a new Cloudwatch log group stream wasn't created when I redelivered the github payload. As it turns out, when I took a closer look at the previous Cloudwatch log stream, I found out that Cloudwatch appends logs for retriggered invocations of the Lambda function to the previous associated Cloudwatch log stream.
I have a scheduled error handling lambda, I would like to use Serverless technology here as opposed to a spring boot service or something.
The lambda will read from an s3 bucket and process accordingly. The problem is at times the s3 bucket may have high volume of data to be processed. long running operations aren't suited to lambdas.
One solution I can think of is have the lambda read and process one item from the bucket and on success trigger another instance of the same lambda unless the bucket is empty/fully-processed. The thing i don't like is that this is synchronous and quite slow. I also need to be conscious of running too many lambdas at the same time as we are hitting a REST endpoint as part of the error flow and don't want to overload it with too many requests.
I am thinking it would be nice to have maybe 3 instances of the lambdas running at the same time until the bucket is empty but not really sure, I am wondering if anyone has any nice patterns that could be used here or suggestions on best practices?
Thanks
Create a S3 bucket for processing your files.
Enable a trigger S3 -> Lambda, on every new file in the bucket lambda will be invoked to process the file, every file is processed separately. https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html
Once the file is processed you could either delete or move file to other place.
About concurrency please have a look at provisioned concurrency https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
Update:
As you still plan to use a scheduler lambda and S3
Lambda reads/lists only the filenames and puts messages into SQS to process the file.
A new Lambda to consume SQS messages and process the file.
Note: I would recommend using SQS initially if the files/messages are not so big, it has built it recovery mechanics, DLQ , delays, visibility etc which you could benefit more than the simple S3 storage, second way is just create message with file reference and still use SQS.
I'd separate the lambda that is called by the scheduler from the lambda that is doing the actual processing. When the scheduler calls the first lambda, it can look at the contents of the bucket, then spawn the worker lambdas to process the objects. This way you have control over how many objects you want per worker.
Given your requirements, I would recommend:
Configure an Amazon S3 Event so that a message is pushed to an Amazon SQS queue when the objects are created in the S3 bucket
Schedule an AWS Lambda function at regular intervals that will:
Check that the external service is working
Invoke a Lambda function to process one message from the queue, and keep looping
The hard part would be throttling the second Lambda function so that it doesn't try to send all request at once (which might impact that external service).
You could probably do this by using a Step Function to trigger Lambda and then, if it was successful, trigger another Lambda function. This could even be done in parallel, such as allowing up to three parallel Lambda executions. The benefit of using Step Functions is that there is no cost for "waiting" for each Lambda to finish executing.
So, the Step Function flow would be something like:
Invoke a "check external service" Lambda function
If it fails, then quit the flow
Invoke the "processing" Lambda function
Get one message
Process the message
If successful, remove the message from the queue
Return success/fail
If it was successful, keep looping until the queue is empty
I have a stack with:
API Gateway
Lambda
Kinesis
When deleting this CloudFormation stack from the AWS console, the process is very slow.
Everything works fine until you execute the exclusion of 'AWS :: Lambda :: Function' -> 'CloudFormation is waiting for NetworkInterfaces associated with the Lambda Function to be cleaned up.'
The time in this process takes about 30 minutes.
Has anyone had the same problem?
To prevent this from blocking the stack deletion, you could set a DeletionPolicy property to Retain for that specific Lambda and have another scheduled process that would clean up each day the orphaned Lambdas.
Check that the lambda function assigned Role has delete permissions for the network interface, ie all of these:
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
If it is unable to delete the interface the Cloudformation template might hang, as you have experienced.
If that doesn't work you might have to script something to delete the ENIs on the lambda while the lambda is trying to teardown.
I have a situation where I have to trigger my workflow based on this condition "It has to process all files in s3 and then start again when there are files in s3". However, I found that Data Pipeline starts every scheduled duration while SWF starts and ends the job which also shut downs my EMR Cluster. Both of them are not suitable in this case. So, for a process which has to start or trigger based on a condition neither is suitable is what I found. Is there any alternative? Or is one of SWF and Data Pipeline could perform my task.
This is more like #Chris's answer's corollary. You still make use of Lambda - listen to S3 - Put Event trigger - so every time when there is a new object being create - the lamdba function would be called.
The Lambda Function can pick up the S3 object's key and put it in SQS; you can run a separate Worker Process which can pick items from the Queue.
To reiterate your statement,
It has to process all files in s3 [ Can be Done by Lambda ]
and then start again when there are files in s3 [Can be Done by SQS & EC2 ]
Look at Lambda. You can set up a trigger so that your code is invoked each time a new object is uploaded to S3.
Data Pipeline supports the concept of Preconconditions which can trigger your execution based on conditions. The S3KeyExists preconditions seems like what you're looking for. This will begin the execution of your activity when a particular S3 key exists.
Data Pipeline will also manage the creation and termination of your resource (EC2 or EMR) based on the activity's execution. If you wish to use your own EC2 instance or EMR cluster you can look into worker groups. Worker group resources are managed by you and will not be terminated by the service.