I have trouble understanding the meaning of the ProjectionType property in DynamoDb's GlobalSecondaryIndex configuration.
For example, if I set it to key, will I only be able to retrieve the key values when querying the table based on the secondary index? Why would that be the case, in my understanding an index would reference a certain row in the database table (from a technical point of view), thus by querying on the index it should be easily possible to retrieve the full datapoint of the index?
What am I missing here?
From what's stated here:
Every secondary index is associated with exactly one table, from which it obtains its data. This is called the base table for the index. When you create an index, you define an alternate key for the index (partition key and sort key). You also define the attributes that you want to be projected, or copied, from the base table into the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the base table. You can then query or scan the index just as you would query or scan a table.
It seems that indexes in DynamoDB are not just pointers / references to items, but a stand-alone self-sufficient storage holding the projected attributes. If it is so, it seems reasonable that when querying the index you are limited to the attributes stored in it.
Related
Can I overwrite an entry in a DynamoDB index be it global or local?
I do not want duplicate entries, but want DynamoDB to overwrite if an entry with same pk+sk exist in the index.
That's not how it works. Local and Global Secondary Indexes explicitly make no guarantees about uniqueness.
From the docs for Local Secondary Indexes:
In a DynamoDB table, the combined partition key value and sort key value for each item must be unique. However, in a local secondary index, the sort key value does not need to be unique for a given partition key value.
— docs
From the docs for Global Secondary Indexes:
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
— docs
Only the base table enforces uniqueness for the primary key - it's nothing that can be enforced by the system for LSIs or GSIs. If you need uniqueness there, you have to design your app and data model to ensure collisions can't happen.
Given that you don't (and can't) write to a GSI/LSI explicitly...
You'll need to ensure that you're updating the appropriate table record whose attributes used for LSI/GSI are changing..
In other words, for there to be only a single record in the GSI/LSI there can be only a single record in the with attribute values used in the GSI/LSI
I'd like to list records from my DDB table ordered by creation date.
My table has an attribute DateCreated.
All examples I can find describe ordering within some partition.
But I want global ordering.
Am I supposed to create an artificial attribute which will have the same value across all records, just to use it as a partition key? E.g. add new attribute GlobalPartition with value 1 to every record in the table, and create a GSI with partition key GlobalPartition and sort key DateCreated. Isn't there a better way?
Thx!
As you noticed, DynamoDB indeed does not have an option to sort items "globally". In other words, there is no way to Scan the database in sorted partition-key order. You can only sort items inside one partition, sorted by the "sort key".
When you have a small amount of data, you can indeed do what you said: Have a single partition with everything in this partition. However it's not clear how practical this approach becomes as your single partition grows - to gigabytes or terabytes, and how well DynamoDB can load-balance when you have just a single partition (I never saw any DynamoDB documentation which answer this question).
So another option is not to have a single partition but rather have a number of them. For example, consider that you want to sort items by date. Now insead of having a single partition, have a partition per month, i.e., the partition key is the month number. Now, if you want to sort everything within a month, you can do it directly, but if you want to get a sorted list of a full year, you need to Query twelve partitions, in order, getting a sorted list in each one and combining it to a sorted list for the full year. So-called time-series databases are often modeled this way.
If you want to sort any data in DynamoDB you need to add Sort Key index on that attribute. If value is not in attribute which maps to tables' sort key, or table does not have sort key, then you need to create GSI and put GSI's sort key on that attribute. You can use LSI too. Any attribute, which maps to "Sort Key" of any index. Table, LSI, GSI.
Check for more details "ScanIndexForward" param of the query request.
If ScanIndexForward is true, DynamoDB returns the results in the order in which they are stored (by sort key value). This is the default behavior. If ScanIndexForward is false, DynamoDB reads the results in reverse order by sort key value, and then returns the results to the client.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#API_Query_RequestSyntax
UI has checkbox too for this:
"Global sort" is not possible, while "global" would mean scan operation and it just runs through all rows in database and filters by filters, yet it does not have sorting option. On query on attribute mapped to sort key has ScanIndexForward option to change sort direction.
What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.
I was trying to get information on how DynamoDB resolves sorting order for global secondary index when two items' hash key - range key values are the same. Does it refer to the original table's sort key?
Thank you.
The order appears to be undefined.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
[...]
Only the items with the specified key values appear in the response; within that set of data, the items are in no particular order.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Is it possible to Query a DynamoDB table using both the hash & range key AND a local secondary index?
I have three attributes I want to compare against in my query. Two are the main hash and range keys and the third is the range key of the local secondary index.
No, but that shouldn't be necessary based on your description of what you are trying to accomplish.
If you are trying to access an object based on the hash and range key (of the main table) as well as an additional attribute, selecting on only the hash and range of the main table (which is required to return a single record by definition) will return that record.
If your concern is that the third attribute may be a value that you want to ignore the entire record you can use a query filter to have that item filtered out by DynamoDB or you can use logic in your application to ignore that object.