Making network/API calls from a Kinesis Firehose transformation Lambda - amazon-web-services

I have a use-case in which I need to make an API call for the payload sent to my Kinesis Firehose stream before storing it in S3.
The flow would be: Kinesis Data Stream -> Kinesis Firehose -> Transformation Lambda -> API call to get additional data relating to current records -> Kinesis Firehose -> S3.
Basically, for a record that is consumed by my Kinesis Firehose stream, I need to call another backend service to get additional data related to the record before storing in S3 for our EMR jobs to consume and write queries on.
My question is, is it possible to make network calls from a Kinesis Firehose transformation Lambda. I think it should be since it's just another Lambda function. I would also like to understand if it's against best practices to make API calls in a Kinesis Firehose transformation Lambda.
Any insight is appreciated!

I dont think there is any problem in making network calls in transformation lambda function. Only thing you need to make sure is that you are returning all the recordIds back to firehose after transformation in firehose accepted format as shown in aws doc

Related

How do I export aws lambda logs(Prints) to Kinesis Data Streams?

I have been Stocked on how to send Lambda logs(Prints) directly to Amazon Kinesis Data Stream. I have Found the way to send Logs from Cloud watch but I would like to send every single prints to kinesis data streams. I have a doubt if I send data from cloud watch does it stream real time prints records to kinesis or not? On this case I would like to use lambda as producer and through the kinesis data S3 as a consumer .
below I have attached a flow work of my conditions.
You can also check the lambda extensions, which helps into direct ingestion of the logs to custom destinations. Its helpful incase you want to avoid cloudwatch costs
https://aws.amazon.com/blogs/compute/using-aws-lambda-extensions-to-send-logs-to-custom-destinations/
You have to create CouldWatch Subscription filter for the Lambda's log stream you want to save to S3. So you would do:
CW Logs subscription ---> Firehose ---> S3

Writing to S3 via Kinesis Stream or Firehose

I have events that keep coming which I need to put to S3. I am trying to evaluate if I muse use Kinesis Stream or Firehose. I also want to wait for few minutes before writing to S3 so that the object is fairly full.
Based on my reading of Kinesis Data stream, I have to create an analytics app which will then be used to invoke a lambda. I will then have to use the lambda to write to S3. Or Kinesis Data Streams can directly write to lambda somehow? I could not find anything indicating the same.
Firehose is not charged by hour(while stream is). So is firehose a better option for me?
Or Kinesis Data Streams can directly write to lambda somehow?
Data Streams can't write directly to S3. Instead Firehose can do this:
delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Splunk, and any custom HTTP endpoint or HTTP endpoints owned by supported third-party service providers, including Datadog, MongoDB, and New Relic.
What's more Firehose allows you to buffer the records before writing them to S3. The writing can happen based on buffer size or time. In addition to that you can process the records using lambda function before writing to S3.
Thus, colectively it seems that Firehose is more suited to your use-case then Data Streams.

When do I need to Kinesis Data Streams together with Kinesis Firehose?

I want to build a use case where I want to do real time analytics. I am not sure when it is necessary to use Kinesis Data Streams before Kinesis Firehose. In the documentation it says that Kinesis Firehose can get the data from Kinesis Data Streams but the use cases are not clear.
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
So the benefit of using Kinesis Firehose to have data passed from Kinesis Data Streams is that it integrates directly with the following services: S3, Redshift, ElasticSearch Service, Splunk.
If you want your streamed data to be delivered to any of those endpoints by passing to Firehose you can have it do the work for you.
Traditionally you'd write your own consumer which would be another piece of code to develop and maintain if it breaks. But using Firehose you can rely on AWS to do this part for you.

Amazon Firehose to transform the data from Kinesis and post it custom end point

I am trying to see the transform the data from Kinesis to stream to my own system using a rest end point. I can do this using a lambda. I am wondering will using Amazon firehouse can help here? I am not able to figure out whether firehouse can write to third party rest services.
The only service Kinesis Firehose can write to is Splunk. If you want Lambda to be triggered the destination would need to be S3 which would then trigger the Lambda.
Alternatively use Kinesis Data Stream which can have a Lambda function act as the consumer to push to your content to the third party. This would require transformation in the Lambda function.
Additional Links
https://aws.amazon.com/kinesis/data-firehose/
https://aws.amazon.com/kinesis/data-streams/

Call Kinesis Firehose vs Kinesis Stream directly from Lambda

I have a need where I wanted to push some data to S3 from lambda. The data coming to Lambda is from a Dynamodb streams. Since, for pushing to S3 bucket, use of Firehose is considered best as it batches and buffers the data before pushing to S3 as well as provide the retry strategy. So, I am using Firehose instead of directly pushing to S3.
But I observe lot of people push data from Lambda to Kinesis Stream from which data is pushed to Kinesis Firehose instead of directly pushing to Firehose from AWS Lambda. Is there any reason of doing it this way? Any benefits? What are the drawbacks of pushing to Kinesis firehose directly?
If Amazon Kinesis Data Firehose meets your needs, then definitely use it! It takes care of most of the work for you, compared to normal Kinesis Streams.
The only time you would not use Firehose is when you have a different destination (eg you want to process the data on Amazon EC2 instances) or you want more control of the streams and shards (eg to process certain producers on specific shards to retain ordering on a per-shard basis).