AWS DAX and GSI negative caching after item creation

AWS DAX and GSI negative caching after item creation - amazon-web-services

So I have the table when I defined one partition and one global secondary index.
Then I create item and query it both by partition key and GSI. It works fine, in both cases I could get my item successfully right after it created in the table.
Now I have added DAX between my application and dynamodb table, and using dax sdk client to retrieve data just like in the manual https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.client.run-application-dotnet.html
And I always getting into negative caching scenario for my GSI query.
Meanwhile, query by partition key works fine within DAX too.
I'm receiving negative response right until TTL for item is expired. Right after that delay I'm able to get my item by GSI. I have tried to delay my query by GSI to avoid negative caching but even 120 second delay is making no change.
I can't find any info about that case in the documentation and would be glad for any useful information.
I could change my table schema to avoid GSI but I believe there is solution.

I'd suggest including a Minimal, Reproducible Example of your code.
Interestingly, I can find nothing in the documentation that indicates DAX works or doesn't work with queries via a GSI or LSI. But since DAX sits in front of the entire DDB service (not just a specific table) and since a GSI is just special type of table. I'll assume DAX should indeed cache query results from a GSI.
If you write to DDB and then immediately try querying the GSI through DAX, it wouldn't surprise me to see nothing returned. GSI's are eventually consistent after all. Or if for instance you query the GSI through DAX to see if something exists before writing the item.
And given the way DAX works, you'd continue to get nothing back till the DAX Query TTL ends.
You should rethink you application design if you're reliant on a GSI to be strongly consistent.
For testing purposes, if you
write to DDB
query GSI directly till you get the item back
query GSI through DAX
I would expect to see the query through DAX always return your data. If that's not the case, I'd open a ticket with AWS.

Related

How should I create this DynamoDB table for almost 100000 records?

I've got a CSV with nearly 100000 records in it and I'm trying to query this data in AWS DynamoDB, but I don't think I did it right.
When creating the table I added a key column airport_id, and then wrote a small console app to import all the records setting a new GUID for the key column for each record as I've only ever used relational databases before so I don't know if this is supposed to work differently.
The problem came when querying the data, as picking a record somewhere in the middle and querying it using the AWS SDK in .NET produced no results at all. I can only put this down to bad DB design on my part as it did get some results depending on the query.
What I'd like, is to be able to query the data, for example
Get all the records that have iso_country of JP (Japan) - doesn't find many
Get all the records that have municipality of Inverness - doesn't find any, the Inverness records are halfway through the set
There are records there, but I'm reckoning it does not have the right design in order to get it in a timely fashion.
Show should I create the DynamoDB table based on the below screenshot?

DynamoDB sorted pull

So I'm working with DynamoDB. My primary key is some unique identifier and then one of the other columns is a timestamp columns. I'd like to pull just the first N events and then continue to pull later events if the client requests this.
I don't know if this is possible to do in Dynamo. Is it possible? How would I go about it? I can redesign the table if needed.
Right now, it seems like the only way to do this would be to run a query all the data in the table and then sort the results returned.

How to select a partition key for for a DynamoDB query?

I have created a dynamo db table with name- "sample".It has below columns. CreatedDate will have creation time of any records inserted to this table.
Itemid,
ItemName,
ItemDescription,
CreatedDate,
UpdatedDate
I am creating a python-flask based rest api which always fetches last 100 records inserted to this table. This API (python-flask function) does not have any input parameters. It should just return the last records inserted to this table.
Question 1
What should be the partition key for this table? I am using the boto3 library to fetch records from DynamoDB. I prefer not to do scan operation because it may cause performance issues. If I use the query function it asks for a partition key. Since this rest API does not accept any input I am not sure how to use it.
Question 2
Has anyone faced similar situation? And what was done to fix this?
Note: I am pretty much newbie to DynamoDB, NoSQL and Boto

To query your table using CreatedDate without knowing the ItemId, you can use Global Secondary Index write sharding by adding an attribute (e.g., ShardId) containing a (0-N) value to every item that you will use for the global secondary index partition key.
Depending on how your items are distributed against CreatedDate, you can set the ShardId so that it is likely to have evenly distributed access patterns. For example: YYYY, YYYYMM or YYYYMMDD. Then, you create a global secondary index with ShardId as an index partition key and CreatedDate as an index sort key.
Knowing the primary key for your GSI (since the ShardId value is derived from CreatedDate), you can query the table for the 100 most recent items with query's Limit parameter (or LastEvaluatedKey if your items set size is larger than 1 MB of data).
See Using Global Secondary Index Write Sharding for Selective Table Queries.

DAX object cache and query cache get out of sync; no way to tell query cache to evict bad data?

According to the DynamoDB DAX documentation, DAX maintains two separate caches: one for objects and one for queries. Which is OK, I guess.
Trouble is, if you change an object and the changed value of the object should impact a value stored in the query cache, there appears to be no way to inform DAX about it, meaning that the query cache will be wrong until its TTL expires.
This is rather limiting and there doesn't appear to be any easy way to work around it.
Someone tell me I don't know what I'm talking about and there is a way to advise DAX to evict query cache values.

I wish there is a better answer, but unfortunately there is no way currently to update the query cache values except for TTL expiry. The item cache values are immediately updated by any Put or Update requests made through DAX, but not if there are any changes made directly to DynamoDB.
However, keep in mind that the key for query cache is the full request; thus changing any field in the request would trigger a cache miss. Obviously, this is not a solution, but it could be an option (hack) to work around the current limitation.

As per the Dynamo Db documentation you have to pass your update query through DAX.
DAX supports the following write operations: PutItem, UpdateItem, DeleteItem, and BatchWriteItem. When you send one of these requests to DAX, it does the following:
DAX sends the request to DynamoDB.
DynamoDB replies to DAX, confirming that the write succeeded.
DAX writes the item to its item cache.
DAX returns success to the requester.
If a write to DynamoDB fails for any reason, including throttling, then the item will not be cached in DAX and the exception for the failure will be returned to the requester. This ensures that data is not written to the DAX cache unless it is first written successfully to DynamoDB.
So instead of using method update of Dynamo db use UpdateItem.
To dig more you can refer this link

How to query a Dynamo DB table without knowing the table name before runtime?

I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.

To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS DAX and GSI negative caching after item creation - amazon-web-services

Related

How should I create this DynamoDB table for almost 100000 records?

DynamoDB sorted pull

How to select a partition key for for a DynamoDB query?

DAX object cache and query cache get out of sync; no way to tell query cache to evict bad data?

How to query a Dynamo DB table without knowing the table name before runtime?

Categories

Resources