Is Kinesis Firehose a replacement to Kinesis Streams? - amazon-web-services

Kinesis Firehose, as well as Kinesis Streams, are used to load streaming data as per the details mentioned in the AWS blogs. There is no concept of shards or maintenance in case of Firehose. In such a case, Is Kinesis Firehose a replacement to Kinesis Streams?

Amazon Kinesis Firehose is an easy way to create a stream where data is sent to one of:
Amazon S3
Amazon Redshift
Amazon Elasticache
You can also create a Lambda function that can manipulate the data on the way through.
If the above suits your needs, then Firehose could be considered a replacement for Kinesis Streams. However, Kinesis Streams offers more flexibility so it is not an exact replacement.

Kinesis Firehose is not a replacement to Kinesis Streams although there are several use cases, Kinesis Firehose has taken over after its introduction.
Kinesis Streams is used to buffer the streaming data from producers and streaming it into custom applications for data processing and analysis which will consume the temporary buffered stream data.
Data producers push data to Kinesis Streams -> Applications read the data from stream and process.
Kinesis Firehose is used to capture and load streaming data into other Amazon services such as S3 and Redshift so that analysis can take place later on.
Data producers push data to Kinesis Firehose -> Data Transformation using Lambda -> Store in S3 or Redshift.
These two can also be used in combination where, Kinesis Streams can stream the data in to Kinesis Firehose so that, it could be persisted after processing.

A thing to take into account when choosing which service to use are the limits and scalability of each solution.
AWS Firehose has a fixed limit of 5mb/sec or 5000 rec/sec (details here), although it can be increased by contacting AWS through a request form.
On the other hand, AWS Kinesis can be scaled easily by increasing the number of shards for each Stream (up to 500 shards by default). The main issue here is that each shard has its own cost and you can only scale up or down by doubling the current amount of shards.
As Ashan said, these services serve different purposes, but you can use each one on its own, or combine them according to your needs. The main advantage here, is that Kinesis Stream can be consumed by many consumers, and be fed by many producers. On the other hand, Firehose Streams act as a consumer for other source of data (such as a Kinesis Stream) and can output data to only one destination (S3, Redshit, Elasticsearch, Splunk).

Not sure how it would be a replacement if there is no persistence of data with Kinesis Firehose, unless you mean it in the context of there is no need for data persistence or perhaps its an issue of cost, then your option would be to analyze that data as soon as it comes in which is Kinesis Firehose and eventually storing it in S3 or ElasticSearch Cluster.

No, just different purposes.
With Kinesis Streams, you build applications using the Kinesis Producer Library put the data into a stream and then process it with an application that uses the Kinesis Client Library and with Kinesis Connector Library send the processed data to S3, Redshift, DynamoDB or ElasticSearch.
With Kinesis Firehose it’s a bit simpler where you create the delivery stream and send the data to S3, Redshift or ElasticSearch (using the Kinesis Agent or API) directly and storing it in those services.
Kinesis Streams, on the other hand, can store the data for up to 7 days.
You may use Kinesis Streams if you want to do some custom processing with streaming data. With Kinesis Firehose you are simply ingesting it into S3, Redshift, DynamoDB or ElasticSearch.

Related

Writing to S3 via Kinesis Stream or Firehose

I have events that keep coming which I need to put to S3. I am trying to evaluate if I muse use Kinesis Stream or Firehose. I also want to wait for few minutes before writing to S3 so that the object is fairly full.
Based on my reading of Kinesis Data stream, I have to create an analytics app which will then be used to invoke a lambda. I will then have to use the lambda to write to S3. Or Kinesis Data Streams can directly write to lambda somehow? I could not find anything indicating the same.
Firehose is not charged by hour(while stream is). So is firehose a better option for me?
Or Kinesis Data Streams can directly write to lambda somehow?
Data Streams can't write directly to S3. Instead Firehose can do this:
delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Splunk, and any custom HTTP endpoint or HTTP endpoints owned by supported third-party service providers, including Datadog, MongoDB, and New Relic.
What's more Firehose allows you to buffer the records before writing them to S3. The writing can happen based on buffer size or time. In addition to that you can process the records using lambda function before writing to S3.
Thus, colectively it seems that Firehose is more suited to your use-case then Data Streams.

When do I need to Kinesis Data Streams together with Kinesis Firehose?

I want to build a use case where I want to do real time analytics. I am not sure when it is necessary to use Kinesis Data Streams before Kinesis Firehose. In the documentation it says that Kinesis Firehose can get the data from Kinesis Data Streams but the use cases are not clear.
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
So the benefit of using Kinesis Firehose to have data passed from Kinesis Data Streams is that it integrates directly with the following services: S3, Redshift, ElasticSearch Service, Splunk.
If you want your streamed data to be delivered to any of those endpoints by passing to Firehose you can have it do the work for you.
Traditionally you'd write your own consumer which would be another piece of code to develop and maintain if it breaks. But using Firehose you can rely on AWS to do this part for you.

What is the difference/use case for Kinesis services of Firehose, pipeline, data stream

I am confused on the different Kinesis services, I've read the following terms:
Kinesis streaming data platform
Kinesis Data Stream
Kinesis Data Firehose
Kinesis Video Stream
Kinesis Data Analytics
Kinesis Data Pipeline
Can any one shed me with some lights on what is each of the services or maybe just a nickname? what are their use cases?
Thanks.
There are 4 flavors of Kinesis. Some of the other ones you've presented seem to be aliases, yes. You can confirm this under "Amazon Kinesis capabilities" at https://aws.amazon.com/kinesis/. I've pulled the descriptions from the FAQs.
Data Streams:
Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs.
Data Analytics:
Amazon Kinesis Data Analytics is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. (TL;DR, you can process data, in near-realtime using SQL application code)
Video Streams:
Amazon Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing.
Data Firehose:
Amazon Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today.
Firehose and Data Streams are very similar. Biggest difference is Firehose will scale for you, where Data Streams gives you control on the number of "shards" your stream has. Shards control how much throughput your stream gets.

Call Kinesis Firehose vs Kinesis Stream directly from Lambda

I have a need where I wanted to push some data to S3 from lambda. The data coming to Lambda is from a Dynamodb streams. Since, for pushing to S3 bucket, use of Firehose is considered best as it batches and buffers the data before pushing to S3 as well as provide the retry strategy. So, I am using Firehose instead of directly pushing to S3.
But I observe lot of people push data from Lambda to Kinesis Stream from which data is pushed to Kinesis Firehose instead of directly pushing to Firehose from AWS Lambda. Is there any reason of doing it this way? Any benefits? What are the drawbacks of pushing to Kinesis firehose directly?
If Amazon Kinesis Data Firehose meets your needs, then definitely use it! It takes care of most of the work for you, compared to normal Kinesis Streams.
The only time you would not use Firehose is when you have a different destination (eg you want to process the data on Amazon EC2 instances) or you want more control of the streams and shards (eg to process certain producers on specific shards to retain ordering on a per-shard basis).

Kinesis Stream to S3 Backup using Firehose

I am using AWS Kinesis Stream that is containing customer transactions. I want to backup the transactions into S3 before start processing them. How can I use bare Kinesis Firehose to backup the transactions from Kinesis Stream without running any Lambda or other computing component for it?
You can reverse the order of your Kinesis building blocks:
Instead of writing into Kinesis Stream, write into Kinesis Firehose that is directed to S3.
Run Kinesis Analytics (KA) application to read the events from your Kinesis Firehose (KF), and write them to a Kinesis Stream (KS). You can use the functionality of KA to do some of the filtering, aggregation and joins that you would otherwise run in your code (Lambda or KCL).