Do DynamoDB secondary indexes contain actual table rows? - amazon-web-services

In the SQL world when you create a non clustered index it creates a separate data structure that allows you to find pointers to table rows based on a key that is not the primary key of the table.
From the DynamoDB docs it seems as though creating a secondary index creates a separate data structure that holds a copy of the actual table rows, not just a pointer to those rows.
Is that right?

That's partially correct - for a global secondary index, it will definitely create a second table and update that asynchronously based on the changes in the primary table. That's why you can only do eventually consistent reads on this index.
For local secondary indexes it's most likely the same table.
There is a talk from re:invent 2018, where they explain the underlying data structures, which I can highly recommend:
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)

Related

AWS DynamoDB sorting without partition key

I have a DynamoDB table with a partition key (UUID) with a few attributes (like name, email, created date etc). Created date is one of the attribute in the item and its format is YYYY-MM-DD. But now there is a requirement change - I have to sort it based on created date and bring the entire data (that is, I cannot just bring the data on a particular partition, but the whole data from all the partitions in a sorted fashion. I know this might take time as DynamoDB to fetch data from all the partitions and sort it after. My question is:
Is the querying possible with the current design? I can see that partition key is required in the query, this is why I am confused because I cannot give a partition key here.
Is there a better way to redesign the table for such a use case?
Thanks in advance.
As your table exists you couldn't change the structure now, and even if you wanted to you would be reliant on UUID as your partition key.
There is functionality however to create a global secondary index for your DynamoDB table.
By using a GSI you can rearrange your data representation to include the creation date as the partition key of your table instead.
The reason why partition keys are important is that data in DynamoDB data is distributed across multiple nodes, with each partition sharing the same node. By performing a query it is more efficient to only communicate with one partition, as there is no need to wait on the other partitions returning results.

How Data get partitioned in table in aws dynamoDb if we have both partition key and secondary key (GSI)

I am new to AWS, while reading the DynamoDB docs I came to know that we can have GSI and partition key on same table.
How does DynamoDB keep the data in the same table on the basis of to keys(partitioned and secondary).
Thanks
DynamoDB actually replicates data changes across to the GSI, in fact AWS recommends the write capacity is equal or higher on a GSI than that of the base table.
To avoid potential throttling, the provisioned write capacity for a global secondary index should be equal or greater than the write capacity of the base table because new updates write to both the base table and global secondary index.
The GSI could be stored on completely different infrastructure than that of the base table.
A global secondary index is stored in its own partition space away from the base table and scales separately from the base table.
Your main table is composed at least of a partition key and an optional sortkey. When you add a GSI, it's actually a replication of the main table using the GSI as the new partition key.
When you update your main table, the changes are propagated to the GSI. There is always a short propagation delay between a write to the parent table and the time when the written data appears in the index. In other words, your applications should take into account that the global secondary index replica is only eventually consistent with the parent table.

how to swap two tables in dynamodb?

I want to maintain two tables in dynamodb. one is primary table and another secondary table.
Primary table will have the recent data. Secondary table will store the previous version of data.
I want to swap between primary and secondary table, so that the API layer accesses the recent data . how to do this in AWS dynamodb?
I would advise using streams. Write and read in the primary table, and using dynamodb streams you can have a lambda duplicate the old data to a second table whenever there is a write event on the primary table.
This article goes into some detail about the process: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

DynamoDB table within table

Is there any way to create a table inside a table with DynamoDB? I have a table that I expect to hold a lot of other information, and another table inside could be useful.
A DynamoDB table can have a list of maps as an attribute, so you could store your JSON objects as native lists/maps within the table. However, if you're appending frequently, keep in mind that the maximum item size in DynamoDB is 400 KB, so you may be better off having a separate table and "joining" on it.

Can you add a global secondary index to dynamodb after table has been created?

With an existing dynamodb table, is it possible to modify the table to add a global secondary index? From the dynamodb control panel, it looks like I have to delete the table and create a new one with the global index.
Edit (January 2015):
Yes, you can add a global secondary index to a DynamoDB table after its creation; see here, under "Global Secondary Indexes on the Fly".
Old Answer (no longer strictly correct):
No, the hash key, range key, and indexes of the table cannot be modified after the table has been created. You can easily add elements that are not hash keys, range keys, or indexed elements after table creation, though.
From the UpdateTable API docs:
You cannot add, modify or delete indexes using UpdateTable. Indexes can only be defined at table creation time.
To the extent possible, you should really try to anticipate current and future query requirements and design the table and indexes accordingly.
You could always migrate the data to a new table if need be.
Just got an email from Amazon:
Dear Amazon DynamoDB Customer,
Global Secondary Indexes (GSI) enable you to perform more efficient
queries. Now, you can add or delete GSIs from your table at any time,
instead of just during table creation. GSIs can be added via the
DynamoDB console or a simple API call. While the GSI is being added or
deleted, the DynamoDB table can still handle live traffic and provide
continuous service at the provisioned throughput level. To learn more
about Online Indexing, please read our blog or visit the documentation
page for more technical and operational details.
If you have any questions or feedback about Online Indexing, please
email us.
Sincerely, The Amazon DynamoDB Team
According to the latest new from AWS, GSI support for existing tables will be added soon
Official statement on AWS forum