Trigger AWS lambda for specific Kinesis event push - amazon-web-services

I am new in AWS lambda area. I am creation a function which will consume Kinesis events. But I want to trigger my lambda function when specific event get push to kinesis (not for all events push to Kinesis). Is there a way that I can configure a filter upfront or my function needs to implement that filter after consuming all events?

One way of doing it is to split out the event you are interested on a separate stream either by:
Use Amazon Kinesis Analytics to copy records to an "event of interest" stream
Trigger another AWS Lambda function to copy records to an "event of interest" stream
Both of these in front of the lambda you currently have, and then connect that lambda to the new stream.

Related

aws transcribe callback function

I want to call AWS transcribe function from an AWS Lambda.
In that lambda handler, I want to start the transcription job but not wait for it to finish in a while loop since it will not be cost-efficient. I don't see any way for the transcription job finish to call another Lambda, or something like that, to store the transcription information in an s3 bucket for example.
Any idea how to solve this?
See Using Amazon EventBridge with Amazon Transcribe.
With Amazon EventBridge, you can respond to state changes in your Amazon Transcribe jobs by initiating events in other AWS services. When a transcription job changes state, EventBridge automatically sends an event to an event stream. You create rules that define the events that you want to monitor in the event stream and the action that EventBridge should take when those events occur. For example, routing the event to another service (or target), which can then take an action. You could, for example, configure a rule to route an event to an AWS Lambda function when a transcription job has completed successfully.
Another alternative is:
when you call StartTranscriptionJob, you supply an S3 bucket name and S3 object key that will receive the transcribed results
you can use the Amazon S3 Event Notifications feature to notify you or to automatically trigger a Lambda function

How to handle Kinesis Data Stream without EC2

I want to handle my Kinesis streaming data without using EC2 instance?
Is there a possibility to accomplish this ie. through Lambda functions etc?
Yes you can use Lambda service to process Kinesis streaming data. What you need to do is to create a Lambda function to process the data (the data will be available through event, first, parameter of the function).
In case of streaming data, your lambda function isn't invoked as a response to some event. Instead, Lambda service is periodically checking Kinesis for available data and then invokes your function.
For this to happen, you need to create event source mapping between your custom lambda function and Kinesis stream where you can also specify size of the batch that will be processed by lambda and its starting position.
Don't forget create proper role for your lambda function, it needs to have access to Kinesis service, so you need something like AWSLambdaKinesisExecutionRole permissions.
Another thing to consider is the batch size and how complicated your processing algorithm is. Lambda can run only for a limited time (currently 15 minutes is a maximum that you can specify), after that, it is automatically terminated by AWS. In such case, you will need to use something else than Lambda or split your lambda function into few smaller ones.

DynamoDB triggering a Lambda function in another Account

I have a DynamoDB table in Account A and an AWS Lambda function in Account B. I need to trigger the Lambda function when there are changes in the DynamoDB table.
I came across aws lambda - It is possible to Access AWS DynamoDB streams accross accounts? - Stack Overflow which says it is not possible. But again I found amazon web services - Cross account role for an AWS Lambda function - Stack Overflow which says it is possible. I am not sure which one is correct.
Has somebody tried the same scenario as I am trying to achieve?
The first link that is being pointed to is correct. Triggers from Stream-based event to Lambda is limited to same aws account and same region.
However, there is a way you will be able to achieve your goal.
Pre-requisite: I assume you already have a Dynamo DB (DDB) table (let's call it Table_A) created in AWS account A. Also, you have a processing lambda(let's call it Processing_Lambda) in AWS account B.
Steps:
Create a new proxy lambda(let's call it Proxy_Lambda) in Account A. This lambda will broadcast the event that it processes.
Enable dynamo stream on DDB table Table_A. This stream will contain all update/insert/delete events being done on the table.
Create a lambda trigger for Proxy_Lambda to read events from dynamo db table stream of Table_A.
Crete new SNS topic (let's call it AuditEventFromTableA) in AWS account A
Add code in Proxy_Lambda to publish the event read from stream to the SNS topic AuditEventFromTableA.
Create an AWS SQS queue (can also be FIFO queue if your use-case requires sequential events). This queue is present in AWS account B. Let's call this queue AuditEventQueue-TableA-AccountA.
Create a subscription for SNS topic AuditEventFromTableA present in AWS account A to the SQS queue AuditEventQueue-TableA-AccountA present in AWS account B. This will allow all the SNS events from account A to be received in the SQS queue of Account B.
Create a trigger for Processing_Lambda present in AWS account B to consume message from SQS queue AuditEventQueue-TableA-AccountA.
Result: This way you will be able to trigger the lambda present in account B, based on the changes in dynamo table of account A.
Note: if your use-case demands strict tracking of the sequential event, you may prefer publishing update events from Proxy_Lambda directly to AWS Kinesis stream present in Account B instead of SNS-SQS path.
Simple!
Create a proxy lambda A in Account A and permit A to call target lambda Bin account B.
DDB stream trigger lambda A. Lambda A call Lambda B.

how should i architect aws lambda to support parallel process in batch model?

i have an aws lambda function to do some statistics on over 1k of stock tickers after market close. i have an option like below.
setup a cron job in ec2 instance and trigger a cron job to submit 1k http request asyn (e.g. http://xxxxx.lambdafunction.xxxx?ticker= to trigger the aws lambda function (or submit 1k request to SNS and let lambda to pickup.
i think it should run fine, but much appreciate if there is any serverless/PaaS approach to trigger task
On top of my head, Here are a couple of ways to achieve what you need:
Option 1: [Cost-Effective]
Post all the ticks to AWS FIFO SQS queue.
Define triggers on this queue to invoke lambda function.
Result: Since you are posting all the events in FIFO queue that maintains the order, all the events will be polled sequentially. More-over SQS to lambda trigger will help you scale automatically based on the number of message in the queue.
Option 2: [Costly and can easily scale for real-time processing]
Same as above, but instead of posting to FIFO queue, post to Kinesis Stream.
Enable Kinesis stream to trigger lambda function.
Result: Kinesis will ensure the order of event arriving in the stream and lambda function invocation will be invoked based on the number of shards in the stream. This implementation scales significantly. If you have any future use-case for real-time processing of tickers, this could be a great solution.
Option 3: [Cost Effective, alternate to Option:1]
Collect all ticker events(1k or whatever) and put it into a file.
Upload this file to AWS S3 bucket.
Enable S3 event notification to trigger proxy lambda function.
This proxy lambda function reads the s3 file and based on the total number of events in the file, it will spawn n parallel actor lambda function.
Actor lambda function will process each event.
Result: Easy to implement, cost-effective and provides easy scaling based on your custom algorithm to distribute the load in the proxy lambda function.
Option 4: [All-serverless]
Write a lambda function that gets the list of tickers from some web-server.
Define an AWS cloud watch rule for generating events based on cron/frequency.
Add a trigger to this cloudwatch rule to invoke proxy lambda function.
Proxy lambda function will use any combination of above options[1, 2 or 3] to trigger the actor lambda function for processing the records.
Result: Everything can be configured via AWS console and easy to use. Alternatively, you can also write your AWS cloud formation template to generate all the required resources in a single go.
Having said that, now I will leave this up to you to choose the right solution based on your business/cost requirements.
You can use lambda fanout option.
You can follow these steps to process 1k or more using serverless aproach.
1.Store all the stock tickers in a S3 file.
2.Create a master lambda which will read the s3 file and split the stocks in groups of 10.
3. Create a child lambda which will make the async call to external http service and fetch the details.
4. In the master lambda Loop through these groups and invoke 100 child lambdas passing in each group and return the results to the
Master lambda
5. Collect all the information returned from the child lambdas and continue with your processing here.
Now you can trigger this master lambda at the end of markets everyday using CloudWatch time based rule scheduler.
This is a complete serverless approach.

Can I limit concurrent invocations of an AWS Lambda?

I have a Lambda function that’s triggered by a PUT to an S3 bucket.
I want to limit this Lambda function so that it’s only running one instance at a time – I don’t want two instances running concurrently.
I’ve had a look through the Lambda configuration and docs, but I can’t see anything obvious. I can about writing my own locking system, but it would be nice if this was already a solved problem.
How can I limit the number of concurrent invocations of a Lambda?
AWS Lambda now supports concurrency limits on individual functions:
https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/
I would suggest you to use Kinesis Streams (or alternatively DynamoDB + DynamoDB Streams, which essentially have the same behavior).
You can see Kinesis Streams as as queue. The good part is that you can use a Kinesis Stream as a Trigger to you Lambda function. So anything that gets inserted into this queue will automatically be passed over to your function, in order. So you will be able to process those S3 events one by one, one Lambda execution after the other (one instance at a time).
In order to do that, you'll need to create a Lambda function with the simple purpose of getting S3 Events and putting them into a Kinesis Stream. Then you'll configure that Kinesis Stream as your Lambda Trigger.
When you configure the Kinesis Stream as your Lambda Trigger I suggest you to use the following configuration:
Batch size: 1
This means that your Lambda will be called with only one event from Kinesis. You can select a higher number and you'll get a list of events of that size (for example, if you want to process the last 10 events in one Lambda execution instead of 10 consecutive Lambda executions).
Starting position: Trim horizon
This means it'll behave as a queue (FIFO)
A bit more info on AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda.
I hope this helps anyone with a similar problem.
P.S. Bear in mind that Kinesis Streams have their own pricing. Using DynamoDB + DynamoDB Streams might be cheaper (or even free due to the non-expiring Free Tier of DynamoDB).
No, this is one of the things I'd really like to see Lambda support, but currently it does not. One of the problems is that if there were a lot of S3 PUT operations happening AWS would have to queue up all the Lambda invocations somehow, and there is currently no support for that.
If you built a locking mechanism into your Lambda function, what would you do with the requests you don't process due to a lock? Would you just throw those S3 notifications away?
The solution most people recommend is to have S3 send the notifications to an SQS queue, and then have your Lambda function scheduled to run periodically, like once a minute, and check if there is an item in the queue that needs to be processed.
Alternatively, have S3 send the notifications to SQS and just have a t2.nano EC2 instance with a single-threaded service polling the queue.
I know this is an old thread, but I ran across it trying to figure out how to make sure my time sequenced SQS messages were processed in order coming out of a FIFO queue and not getting processed simultaneously/out-of-order via multiple Lambda threads running.
Per the documentation:
For FIFO queues, Lambda sends messages to your function in the order
that it receives them. When you send a message to a FIFO queue, you
specify a message group ID. Amazon SQS ensures that messages in the
same group are delivered to Lambda in order. Lambda sorts the messages
into groups and sends only one batch at a time for a group. If your
function returns an error, the function attempts all retries on the
affected messages before Lambda receives additional messages from the
same group.
Your function can scale in concurrency to the number of active message
groups.
Link: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
So essentially, as long as you use a FIFO queue and submit your messages that need to stay in sequence with the same MessageGroupID, SQS/Lambda automatically handles the sequencing without any additional settings necessary.
Have the S3 "Put events" cause a message to be placed on the queue (instead of involving a lambda function). The message should contain a reference to the S3 object. Then SCHEDULE a lambda to "SHORT POLL the entire queue".
PS: S3 events can not trigger a Kinesis Stream... only SQS, SMS, Lambda (see http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-destinations). Kinesis Stream are expensive and used for real-time event handling.