I am investigating the use of Amazon DynamoDB and would like to stream query results to my remote clients.
I cannot find any Amazon DynamoDB documentation that shows it supports said streams.
All I can find are Amazon DynamoDB stream endpoints for live streams as data is changed within the database.
These are not the streams Iam interested in.
I wish to query the Amazon DynamoDB and retrieve the results as a stream to enable me to transmit the streamed data to my remote clients via HTTP.
Does Amazon DynamoDB support this type of streaming of results?
I am looking to deploy code that resembles this on my server
private StreamingOutput getStreams() {
return new StreamingOutput() {
#Override
public void write(final OutputStream outputStream) throws IOException, WebApplicationException {
outputStream.write(getArticles());
outputStream.flush();
outputStream.close();
}
};
}
and my client uses Retrofit
#Streaming
#GET
Call<ResponseBody> fetchData();
Unless I misunderstood what you're trying to do, DynamoDB does allow streaming. From AWS Docs:
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near real time.
...
To read and process a stream, your application will need to connect to a DynamoDB Streams endpoint and issue API requests.
Amazon DynamoDB Streams actions
Related
i am working in the IoT Space with 2 Databases. AWS Time Stream & AWS DynamoDB.
My sensor data is coming into Time Stream via AWS IoT Core and MQTT. I set up a rule, that gives permission to transfer the incoming data directly into Time Stream.
What i need to do now is to run some operations on the data and save the result of these operations into DynamoDB.
I know with DynamoDB there is function called DynamoDB Streams. Is there a solution like Streams in Time Stream as well? Or does anybody has an idea, how i can automatically transfer the results of the operations from Time Stream to DynamoDB?
Timestream does not have Change Data Capture capabilities.
Best thing to do is to write the data into DynamoDB from wherever you are doing your operations on Timestream. For example, if you are using AWS Glue to analyze your Timestream data, you can sink the results directly from Glue using the DynamoDB sink.
Timestream has the concept of Schedule Query. When a query has ran, you can be notified via a SNS topic. You could connect a lambda on that SNS topic to retrieve the query result and store it in DynamoDB.
I have events that keep coming which I need to put to S3. I am trying to evaluate if I muse use Kinesis Stream or Firehose. I also want to wait for few minutes before writing to S3 so that the object is fairly full.
Based on my reading of Kinesis Data stream, I have to create an analytics app which will then be used to invoke a lambda. I will then have to use the lambda to write to S3. Or Kinesis Data Streams can directly write to lambda somehow? I could not find anything indicating the same.
Firehose is not charged by hour(while stream is). So is firehose a better option for me?
Or Kinesis Data Streams can directly write to lambda somehow?
Data Streams can't write directly to S3. Instead Firehose can do this:
delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Splunk, and any custom HTTP endpoint or HTTP endpoints owned by supported third-party service providers, including Datadog, MongoDB, and New Relic.
What's more Firehose allows you to buffer the records before writing them to S3. The writing can happen based on buffer size or time. In addition to that you can process the records using lambda function before writing to S3.
Thus, colectively it seems that Firehose is more suited to your use-case then Data Streams.
I was assuming I
create a table and enable stream and I now have an ARN
create a kinesis stream
configure somewhere to tell the dynamoDb stream to write to kinesis stream
I was looking at working with https://github.com/harlow/kinesis-consumer but this reads from kinesis or can I use the ARN and use it to read right from the dynamoDB stream?
The more I look, the more I seem to think, I have to write a lambda to read dynamoDB and write to kinesis. Is that correct?
thanks
Hey can you provide a bit more of information about your target setup? do you plan to have some sort of ETL process for your dynamoDB table? AFAIK when you bound a kinesis stream to a dynamodb table, everytime you add, remove or update rows on the dynamodb a new event will be publish in the associated kinesis stream which you can consume from and use the event in whatever way you want.
maybe worth checking this one:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html
DynamoDB now support Kinesis Data Streams natively:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/kds.html
You can choose either DynamoDB Streams or Kinesis Data Streams for your Change Data Capture (CDC).
Properties
Kinesis Data Streams for DynamoDB
DynamoDB Streams
Data retention
Up to 1 year.
24 hours.
Kinesis Client Library (KCL) support
Supports KCL versions 1.X and 2.X.
Supports KCL version 1.X.
Number of consumers
Up to 5 simultaneous consumers per shard, or up to 20 simultaneous consumers per shard with enhanced fan-out.
Up to 2 simultaneous consumers per shard.
Throughput quotas
Unlimited.
Subject to throughput quotas by DynamoDB table and AWS Region.
Record delivery model
Pull model over HTTP using GetRecords and with enhanced fan-out, Kinesis Data Streams pushes the records over HTTP/2 by using SubscribeToShard.
Pull model over HTTP using GetRecords.
Ordering of records
The timestamp attribute on each stream record can be used to identify the actual order in which changes occurred in the DynamoDB table.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
Duplicate records
Duplicate records might occasionally appear in the stream.
No duplicate records appear in the stream.
Stream processing options
Process stream records using AWS Lambda, Kinesis Data Analytics, Kinesis data firehose , or AWS Glue streaming ETL.
Process stream records using AWS Lambda or DynamoDB Streams Kinesis adapter.
Durability level
Availability zones to provide automatic failover without interruption.
Availability zones to provide automatic failover without interruption.
You can use Amazon Kinesis Data Streams to capture changes to Amazon DynamoDB. According to the AWS documentation:
Kinesis Data Streams captures item-level modifications in any DynamoDB table and replicates them to a Kinesis data stream. Your applications can access this stream and view item-level changes in near-real time. You can continuously capture and store terabytes of data per hour. You can take advantage of longer data retention time—and with enhanced fan-out capability, you can simultaneously reach two or more downstream applications. Other benefits include additional audit and security transparency.
Also You can enable streaming to Kinesis from your DynamoDB table.
I would like to be notified when a DynamoDB table changes, the same way as Google Firebase Realtime Database.
I consuming this service in a frontend javascript application.
DynamoDB doesn't have realtime notification/trigger for update on table.
But in this case you can try to use DynamoDB Streams for Capturing Table Activity.
Here are some example use cases:
An application in one AWS region modifies the data in a DynamoDB
table. A second application in another AWS region reads these data
modifications and writes the data to another table, creating a replica
that stays in sync with the original table.
A popular mobile app modifies data in a DynamoDB table, at the rate of
thousands of updates per second. Another application captures and
stores data about these updates, providing near real time usage
metrics for the mobile app.
A global multi-player game has a multi-master topology, storing data
in multiple AWS regions. Each master stays in sync by consuming and
replaying the changes that occur in the remote regions.
An application automatically sends notifications to the mobile devices
of all friends in a group as soon as one friend uploads a new picture.
A new customer adds data to a DynamoDB table. This event invokes
another application that sends a welcome email to the new customer.
more details in this DynamoDB Streams document.
And here is how to you can integrate DynamoDB Streams with AWS Javascript SDK:
var dynamodbstreams = new AWS.DynamoDBStreams();
dynamodbstreams.describeStream(params, function (err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
We have some Event supported by DynamoDB Streams
eventName — (String) The type of data modification that was performed
on the DynamoDB table:
INSERT - a new item was added to the table.
MODIFY - one or more of an existing item's attributes were modified.
REMOVE - the item was deleted from the table.
By the way, if you want to notify to your client via another way instead of DynamoDB Streams you can try to using Lambda Function follow this article.
Hope this can help you solving your issue.
DynamoDB and Firebase/Firestore are really different.
Firebase/Firestore is a realtime database where you scan subscribe to changes on the client.
DynamoDB is a NoSQL Database to Store Key/Value Pairs.
More suitable for a similar use case is "AWS AppSync" which provides live updates like Firebase/Firestore does.
If you want to use DynamoDB nonetheless have a look at DynamoDB Streams to trigger an event on update of the table.
The questions is then how do you get the update to the client.
You could send a message to an SNS Topic, sending Push Notifications to the client if necessary.
But in the end you will build with DynamoDB Streams and SNS and maybe Lambda what Firebase/Firestore or "AWS AppSync" provides out of the box.
I normally see the DynamoDB -> SNS topic pattern -> (With custom lambda).
If your application is for mobile have you taken a look at AWS SNS Mobile Push and seen if it would not be a better fit for your architecture.
Background
I have found that Amazon Kinesis Data Analytics can be used for streaming data as well as data present in an S3 bucket.
However, there are some parts of the Kinesis documentation that make me question whether Amazon Kinesis Analytics can be used for a huge amount of existing data in an S3 bucket:
Authoring Application Code
We recommend the following:
In your SQL statement, don't specify a time-based window that is longer than one hour for the following reasons:
Sometimes an application needs to be restarted, either because you updated the application or for Kinesis Data Analytics internal reasons. When it restarts, all data included in the window must be read again from the streaming data source. This takes time before Kinesis Data Analytics can emit output for that window.
Kinesis Data Analytics must maintain everything related to the application's state, including relevant data, for the duration. This consumes significant Kinesis Data Analytics processing units.
Question
Will Amazon Kinesis Analytics be good for this task?
The primary use case for Amazon Kinesis Analytics is stream data processing. For this reason, you attach an Amazon Kinesis Analytics application to a streaming data source. You can optionally include reference data from S3, which is limited in size to 1 GB at this time. We will load data from an S3 object into a SQL table that you can use to enrich the incoming stream.
It sounds like want a more general purpose tool for querying data from S3, not a stream data processing solution. I would recommend looking at Presto and Amazon EMR instead of using Amazon Kinesis Analytics.
Disclaimer: I work for the Amazon Kinesis team.