Is dynamoDB' item available for querying immediately? - amazon-web-services

I added some items to dynamoDB table using DynamoDBMapper.save. I then queried the item immediately. Will I definitely get the saved item? Or I should put thread.sleep() before querying the item? In SQL database, we use transactions and we can guarantee that we will get the item once the record is inserted to sql table. But for dynamoDB, I am not sure. Checked AWS dynamodb documents but didn't find related information.

DynamoDB reads are eventually consistent by default. However, DynamoDB does allow you to specify strongly consistent reads using the ConsistentRead parameter for Read operations. It does come at a cost however, strongly consistent reads take up twice as much Read Capacity Units.
See: Read consistency in DynamoDB

Related

Fetching large amount of data from dynamo DB table using primary key

I am quite new to dynamo DB I have a requirement in which I need to fetch around 120 million rows from the dynamo DB table. Criteria to fetch is based on PK(basically I need to fetch all the rows pertaining to CAR_********* Primary key pattern). The only way which I can figure out is to perform get operation but it's consuming a lot of time. I also looked for the option of a bulk get but that too has a limit of 100 rows or 16mb of data.
So, Can someone suggest a better and faster approach to extract this data?
First off, DynamoDB is optimized for storing and retrieving single data objects by primary key. If you need to regularly retrieve or update millions of rows, you should look at an alternative datastore.
With that out of the way, if this is a one-time task I recommend spinning up a Redshift database and using the COPY command to retrieve the data from Dynamo. You can then download that data using a single SQL statement.
If you don't want to do this, or are expecting to retrieve the data more than once, you need to use the Scan API. This will return at most 1 MB per call, so you'll need to call it in a loop.
Regardless, you will almost certainly need to increase your read capacity to handle this task.

Best practice of using Dynamo table when it needs to be periodically updated

In my use case, I need to periodically update a Dynamo table (like once per day). And considering lots of entries need to be inserted, deleted or modified, I plan to drop the old table and create a new one in this case.
How could I make the table queryable while I recreate it? Which API shall I use? It's fine that the old table is the target table. So that customer won't experience any outage.
Is it possible I have something like version number of the table so that I could perform rollback quickly?
I would suggest table name with a common suffix (some people use date, others use a version number).
Store the usable DynamoDB table name in a configuration store (if you are not already using one, you could use Secrets Manager, SSM Parameter Store, another DynamoDB table, a Redis cluster or a third party solution such as Consul).
Automate the creation and insertion of data into a new DynamoDB table. Then update the config store with the name of the newly created DynamoDB table. Allow enough time to switchover, then remove the previous DynamoDB table.
You could do the final part by using Step Functions to automate the workflow with a Wait of a few hours to ensure that nothing is happening, in fact you could even add a Lambda function that would validate whether any traffic is hitting the old DynamoDB.

handling dynamo db read and write units

I am using dynamo db as back end database in my project, I am storing items in the table with each of size 80 Kb or more(contains nested JSON), and my partition key is a unique valued column(unique for each item). Now i want to perform pagination on this table i.e., my UI will provide(start-Integer, limit-Integer and type-2 string constants) and my API should retrieve the items from dynamo db based on the provided input query parameters from UI. I am using SCAN method from boto3 (python SDK) but this scan is reading all the items from my table prior to considering my filters and causing provision throughput error, but I cannot afford to either increase my table's throughput or opt table auto-scaling. Is there any way how my problem can be solved? Please give your suggestions
Do you have a limit set on your scan call? If not, DynamoDB will return you 1MB of data by default. You could try using limit and some kind of sleep or delay in your code, so that you process your table at a slower rate and stay within your provisioned read capacity. If you do this, you'll have to use the LastEvaluatedKey returned to you to page through your table.
Keep in mind that just to read a single one of your 80kb items, you'll be using 10 read capacity units, and perhaps more if your items are larger.

AWS hosted data storage for storing simple entities

I need to choose data storage for simple system. The main purpose of the system is storing events - simple entities with timestamp, user id and type. No joins. Just single table.
Stored data will be fetched rarely (compared with writes). I expect following read operations:
get latest events for a list of users
get latest events of a type for a list of users
I expect about 0.5-1 million writes a day. Data older than 2 years can be removed.
I'm looking for best fitted service provided by AWS. I wonder if using redshift is like taking a sledgehammer to crack a nut?
For your requirement you can use AWS DynamoDB and also define the TTL values to remove the older items automatically. You get the following advantages.
Fully managed data storage
Able to scale with the need for write throughput (Though it can be costly)
Use sort key with timestamp to query latest items.
I would also like to check the AWS Simple DB as it looks more fit(in a first glance) for your requirements.
Please refer this article which explains some practical user experience.
http://www.masonzhang.com/2013/06/2-reasons-why-we-select-simpledb.html

AWS DynamoDB - a way to get when an item was last updated?

Is there a way to check when a dynamoDB item was last updated without adding a separate attribute for this? (For example, are there any built-in attributes or metadata that could provide this information?)
No. This is not a built-in feature of the DynamoDB API.
You have to implement yourself by adding a column to each item for each UpdatedTime with the current time.
For example, are there any built-in attributes or metadata that could
provide this information? No
There are multiple approaches to implement this using DynamoDB.
Use either sort key, GSI or LSI with time stamp attribute, to query last updated item.
When adding an item to the table, keep track of last updated time at your Backend.
Using DynamoDB streams, create a Lambda function which executives, when an item is added to track last updated time.
Note: If you are going with last two approaches, you can still use a seperate DynamoDB table to store Metadata such as last updated attribute.
I don't think there is an out of the box solution for that but you can use DynamoDB streams with basic Lambda function to keep track of which items are updated, then you can store this information somewhere else like S3(through Kinesis Firehose) or you can update the same table.
It may be possible when using Global Tables, with the automatically created aws:rep:updatetime attribute.
See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
It's not clear if this functionality remains with the latest version though - I'll update this answer if I find out concretely.