Writing to Kinesis stream using AWS Lambda Function - amazon-web-services

Can we create a Lambda function like which can get executed when we write a record to Dynamo DB table & that record is written to Kinesis stream ?? Basically can we write to Kinesis stream using Lambda function?? If yes please share sample code for that..Also I want to know how does that work.....Thank You

Yes. You can create a Dynamo Trigger backed by a Lambda function, and have that Lambda Function write to a stream.
Here is a walk through that shows how to create a trigger:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html
In the body of your Lambda function you can then call the Kinesis "putRecord" function. Here's info on "putRecord":
http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html
If you are implementing your Lambda function in Node.js, here's a link to the SDK docs for Kinesis:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Kinesis.html#putRecord-property
Similarly here is a link for the Java SDK (if you are using java):
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/AmazonKinesis.html#putRecord(com.amazonaws.services.kinesis.model.PutRecordRequest)
And a link to the Boto docs (if you are using python):
http://boto.cloudhackers.com/en/latest/ref/kinesis.html
The doc links should have all the info your need.

Not answering the question but updating latest solution to the part of the question
when we write a record to Dynamo DB table & that record is written to Kinesis stream
You can directly enable Kinesis streams on dynamo db table now https://aws.amazon.com/about-aws/whats-new/2020/11/now-you-can-use-amazon-kinesis-data-streams-to-capture-item-level-changes-in-your-amazon-dynamodb-table/

Related

Need recommendation to create an API by aggregating data from multiple source APIs

Before I start doing this I wanted to get advice from the community on the best and most efficient manner to go about doing it.
Here is what I want to do:
Ingest data from multiple API's which returns JSON
Store it in either S3 or DynamoDB
Modify the data to use my JSON structure
Pipe out the aggregate data as an API
The data will be updated twice a day, so I would pull in the data from the source APIs and put it through my pipeline twice a day.
So basically I want to create an API by aggregating data from multiple source APIs.
I've started playing with Lambda and created the following function using Python.
#https://stackoverflow.com/a/41765656
import requests
import json
def lambda_handler(event, context):
#https://www.nylas.com/blog/use-python-requests-module-rest-apis/ USEFUL!!!
#https://stackoverflow.com/a/65896274
response = requests.get("https://remoteok.com/api")
#print(response.json())
return {
'statusCode': 200,
'body': response.json()
}
#https://stackoverflow.com/questions/63733410/using-lambda-to-add-json-to-dynamodb DYNAMODB
This works and returns a JSON response.
Here are my questions:
Should I store the data on S3 or DynamoDB?
Which AWS service should I use to aggregate the data into my JSON structure?
Which service should I use to publish the aggregate data as an API, API Gateway?
However, before I go further I would like to know what is the best way to go about doing this.
If you have experience with this I would love to hear from you.
The answer will vary depending on the quantity of data you're planning to mine. Lambdas are designed for short-duration, high-frequency workloads and thus might not be suitable.
I would recommend looking into AWS Glue, as this seems like a fairly typical ETL (Extract Transform Load) problem. You can set up glue jobs to run on a schedule, and as for data aggregation, that's the T in ETL.
It's simple to output the glue dataframe (result of a transformation) as s3 files, which can then be queried directly by Amazon Athena (as if they were db content).
As for exposing that data via an API, the serverless framework or SST are great tools for taking the sting out of spinning up a serverless API and associated resources.

AWS Step Function manual approval process

I am working on the requirement where the data entered in the form needs to be validated manually and once validated , a approval mail be be sent out and then data will be stored in the database.I plan to use AWS step function for this with token.
https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/
I plan to use a similar design like in the link above.However is there a way not to use API Gateway for sending back the task token to step function to resume processing.Did anybody worked on the similar requirement and how the functionality was achieved. Thank you.
Step function can be invoked by the AWS Lambda function as well.
Once the form is validated and stored in database, you can trigger the Lambda function based on the database events(ex- if DynamoDB used then based on the DynamDB streams), and the lambda can start the step function.

AWS-CDK - DynamoDB Initial Data

Using the AWS CDK for a Serverless project but I've hit a sticking point. My project deploys a DynamoDB table which I need to populate with data prior to my Lambda function executing.
The data that needs to be loaded is generated by making API calls and isn't static data that can be loaded by a .json file or something simple.
Any ideas on how to approach this requirement for a production workload?
You can use AwsCustomResource in order to make a PutItem call to the table.
AwsSdkCall initializeData = AwsSdkCall.builder()
.service("DynamoDB")
.action("putItem")
.physicalResourceId(PhysicalResourceId.of(tableName + "_initialization"))
.parameters(Map.ofEntries(
Map.entry("TableName", tableName),
Map.entry("Item", Map.ofEntries(
Map.entry("id", Map.of("S", "0")),
Map.entry("data", Map.of("S", data))
)),
Map.entry("ConditionExpression", "attribute_not_exists(id)")
))
.build();
AwsCustomResource tableInitializationResource = AwsCustomResource.Builder.create(this, "TableInitializationResource")
.policy(AwsCustomResourcePolicy.fromStatements(List.of(
PolicyStatement.Builder.create()
.effect(Effect.ALLOW)
.actions(List.of("dynamodb:PutItem"))
.resources(List.of(table.getTableArn()))
.build()
)))
.onCreate(initializeData)
.onUpdate(initializeData)
.build();
tableInitializationResource.getNode().addDependency(table);
The PutItem operation will be triggered if the stack is created or if table is updated (tableName is supposed to be different in that case). If it doesn't work for some reason, you can set physicalResourceId to random value, i.e. UUID, to trigger the operation for each stack update (the operation is idempotent due to ConditionExpression)
CustomResource allows you to write custom provisioning logic. In this case you could use something like a AWS Lambda Function in a Custom Resource to read in the custom json and update DynamoDb.

Send S3 document to Textract using Go

I'm trying to use Go to send objects in a S3 bucket to Textract and collect the response.
I'm using the aws go sdk package and able to connect to my S3 bucket and list all the objects contained within. So far so good. I now need to be able to send one of those objects (a .pdf file) to Textract and collect the response(s).
The AWS Go SDK content for interacting with Textract seem to be quite extensive but I cannot find a good example for how to do this.
I would be very grateful for a sample or advice on how to do this.
To start a job, you invoke StartDocumentTextDetection, using a DocumentLocation to specify the file, and you specify a SNS topic where Textract will publish a notification when it has finished to process your job.
You have now two possibilities:
Subscribe to the SNS topic, and when you receive a message retrieve the result
Create a lambda function triggered by the SNS topic, which retrieves the result.
The second option is IMO better 'cause it use less computation time (doesn't run until the job hasn't finished).
To retrieve the job, you use GetDocumentTextDetection
If anyone else reaches this site searching for an answer:
I understood the documentation as if I could just call the StartDocumentAnalysis function through the textract SDK but in fact what was missing is the fact that you need to create a new Session first and do the calls based on the session:
https://docs.aws.amazon.com/sdk-for-go/api/service/textract/#New

How can we efficiently push data from csv file to dynamodb without using aws pipeline?

Considering the fact that there is no data pipeline available in Singapore region, are there any alternatives available to efficiently push csv data to dynamodb?
If it was me, I would setup an s3 event notification on a bucket that fires a lambda function each time a CSV file was dropped into it.
The Notification would let Lambda know that a new file was available and a lambda function would be responsible for loading the data into dynamodb.
This would work better (because of the limits of lambda) if the CSV files were not huge, so they could be processed in a reasonable amount of time, and the bonus is the only worked that would need to be done once it was working would be to simply drop the new files into the right bucket - no server required.
Here is a github repository that has a CSV->Dynamodb loader written in java - it might help get you started.