Can I overwrite an entry in a DynamoDB index be it global or local?
I do not want duplicate entries, but want DynamoDB to overwrite if an entry with same pk+sk exist in the index.
That's not how it works. Local and Global Secondary Indexes explicitly make no guarantees about uniqueness.
From the docs for Local Secondary Indexes:
In a DynamoDB table, the combined partition key value and sort key value for each item must be unique. However, in a local secondary index, the sort key value does not need to be unique for a given partition key value.
— docs
From the docs for Global Secondary Indexes:
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
— docs
Only the base table enforces uniqueness for the primary key - it's nothing that can be enforced by the system for LSIs or GSIs. If you need uniqueness there, you have to design your app and data model to ensure collisions can't happen.
Given that you don't (and can't) write to a GSI/LSI explicitly...
You'll need to ensure that you're updating the appropriate table record whose attributes used for LSI/GSI are changing..
In other words, for there to be only a single record in the GSI/LSI there can be only a single record in the with attribute values used in the GSI/LSI
Related
I have trouble understanding the meaning of the ProjectionType property in DynamoDb's GlobalSecondaryIndex configuration.
For example, if I set it to key, will I only be able to retrieve the key values when querying the table based on the secondary index? Why would that be the case, in my understanding an index would reference a certain row in the database table (from a technical point of view), thus by querying on the index it should be easily possible to retrieve the full datapoint of the index?
What am I missing here?
From what's stated here:
Every secondary index is associated with exactly one table, from which it obtains its data. This is called the base table for the index. When you create an index, you define an alternate key for the index (partition key and sort key). You also define the attributes that you want to be projected, or copied, from the base table into the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the base table. You can then query or scan the index just as you would query or scan a table.
It seems that indexes in DynamoDB are not just pointers / references to items, but a stand-alone self-sufficient storage holding the projected attributes. If it is so, it seems reasonable that when querying the index you are limited to the attributes stored in it.
I'm reading the AWS docs about secondary indices and I don't understand the following statement:
The index key does not need to have any of the key attributes from the
table
From what I understand GSI allows me to create a primary or sort key on an attirubte in my table after its creation.
I would like to make sure I understand the statement above, does it mean exactly that I can create a primary or sort key on an attribute that is different from the current table's primary/hash key?
Yes, that is exactly what it means. Let's suppose that you have a table with a composite primary key that consists of bundle_id as the partition key and item_id as the sort key. Let's suppose you also have in that table an attribute called client_id.
You can then create a GSI, let's call it client_id-index with client_id as its partition key and you can include some other attributes in the GSI too.
Then you can query the GSI like this (code sample using Python and Boto3)
table.query(
IndexName='client_id-index',
KeyConditionExpression=Key('client_id').eq("123456")
)
Please note that even if you specify ProjectionType as INCLUDE in your GSI and your include some non-key attributes, the key attributes from the table will be also included in your GSI.
What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.
I was trying to get information on how DynamoDB resolves sorting order for global secondary index when two items' hash key - range key values are the same. Does it refer to the original table's sort key?
Thank you.
The order appears to be undefined.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
[...]
Only the items with the specified key values appear in the response; within that set of data, the items are in no particular order.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Is it possible to Query a DynamoDB table using both the hash & range key AND a local secondary index?
I have three attributes I want to compare against in my query. Two are the main hash and range keys and the third is the range key of the local secondary index.
No, but that shouldn't be necessary based on your description of what you are trying to accomplish.
If you are trying to access an object based on the hash and range key (of the main table) as well as an additional attribute, selecting on only the hash and range of the main table (which is required to return a single record by definition) will return that record.
If your concern is that the third attribute may be a value that you want to ignore the entire record you can use a query filter to have that item filtered out by DynamoDB or you can use logic in your application to ignore that object.