Why does DynamoDB stream trigger but the entry is not in DynamoDB - amazon-web-services

I am using DynamoDB as my database and have DynamoDB streams setup to do some extra things when a row is saved onto DynamoDB.
But my problem is that when i write to a database. The dynamo stream is getting triggered but when i try to fetch the data from the lambda that is triggered by the stream the event is not present in the dynamo table.
Really not sure why it is happening. Is it something that happens when the table gets big and we try to add a lot of data in dynamo at the same time?
I have a combination of partition key and sort key. partition key is the userId and sort key is the timestamp. Could it be something due to that?
Could it be that the entries are lot of a lot of different services try to write to dynamo at the same time?

Related

Streaming data from dynamodb to Redshift with kinesis - backfilling history?

I'm looking at the diagram here, and from my understanding, a DynamoDB stream into a redshift table via kinesis firehose will send the updates as redshift commands to the table (i.e. update, insert etc). So this will keep a redshift version of a dynamodb table in sync.
But how do you deal with the historical data? Is there a good process for filling the redshift table with data to date, that can then be kept in sync via a dynamodb stream? It's not trivial, because some updates may be lost if I manually copy the data into a redshift table and then switch on a dynamodb stream depending on the timing.
So regarding the diagram, it shows kinesis firehose delivering data to s3, queryable by athena. I feel like I'm missing sometthing because if the data going to s3 are only updates and new records, it doesn't seem like something that works well for athea (a partitioned snapshot of the entire table makes more sense).
So if I have a dynamodb table that is currently receiving data, and I want to create a new redshift table that contains all the same data up to a given time, and then gets all the updates via a dynamodb stream after, how do I go about doing that?

AWS Lambda function data consistency with DynamoDb

Using DyanamoDb streaming I am triggering a Lambda function. In function I am retrieving data from dynamodb based on primary key and if key matches the row needs to be updated. If not matched then a new entry will be created in dynamodb.
There are possibilities where Lambda function can scale as per shards created in streaming.
If received requests with similar primary key in different shards, multiple instances of Lambda function will try to get and update the same row at same time. Which will eventually produce wrong data in database (Chances of data overwrite).
I am thinking about solution to use UUID column and insert condition based data in dynamodb so it will fail if updated by another instance. But then need to execute all steps again for failed data.
Another solution where "reservedConcurrentExecutions" property of lambda function to 1 and then lambda function does not scale. Not sure if it throws exception when more than 1 shards get created in dynamodb streaming
I would like to know how I can implement this scenario.

AWS DynamoDB Trigger

I have one dynamo table. Any data insert to a table it will trigger one lambda function. So in loop when I hit data to dynamo table. Some time trigger not happening for one or two rows
What is the solution for trigger not happening for loop

Sync AWS DynamoDB data with local DynamoDB instance

If I am not wrong for local DynamoDB, the data is present in shared-local-instance.db file. Is there a way to sync data from DynamoDB on AWS with my local DynamoDB (shared-local-instance.db)?
Also if a new table is created on AWS DynamoDB, can I pull that also along with its records. I don't want to manually create a table or enter the records to get sync with my local DynamoDB table. Hoping to get easy way out of this. Thanks in advance.

Copying only new records from AWS DynamoDB to AWS Redshift

I see there is tons of examples and documentation to copy data from DynamoDB to Redshift, but we are looking at an incremental copy process where only the new rows are copied from DynamoDB to Redshift. We will run this copy process everyday, so there is no need to kill the entire redshift table each day. Does anybody have any experience or thoughts on this topic?
Dynamo DB has a feature (currently in preview) called Streams:
Amazon DynamoDB Streams maintains a time ordered sequence of item
level changes in any DynamoDB table in a log for a duration of 24
hours. Using the Streams APIs, developers can query the updates,
receive the item level data before and after the changes, and use it
to build creative extensions to their applications built on top of
DynamoDB.
This feature will allow you to process new updates as they come in and do what you want with them, rather than design an exporting system on top of DynamoDB.
You can see more information about how the processing works in the Reading and Processing DynamoDB Streams documentation.
The copy from redshift can only copy the entire table. There are several ways to achieve this
Using an AWS EMR cluster and Hive - If you set up an EMR cluster then you can use Hive tables to execute queries on the dynamodb data and move to S3. Then that data can be easily moved to redshift.
You can store your dynamodb data based on access patterns (see http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.TimeSeriesDataAccessPatterns). If we store the data this way, then the dynamodb tables can be dropped after they are copied to redshift
This can be solved with a secondary DynamoDB table that tracks only the keys that were changed since the last backup. This table has to be updated wherever initial DynamoDB table is updated (add, update, delete). At the end of a backup process you will delete them or after you backup a row (one by one).
If your DynamoDB table can have
Timestamps as an attribute or
A binary flag which conveys data freshness as attribute
then you can write a hive query to export only current day's data or fresh data to s3 and then 'KEEP_EXISTING' copy this incremental s3 data to Redshift.