DynamoDB table within table - amazon-web-services

Is there any way to create a table inside a table with DynamoDB? I have a table that I expect to hold a lot of other information, and another table inside could be useful.

A DynamoDB table can have a list of maps as an attribute, so you could store your JSON objects as native lists/maps within the table. However, if you're appending frequently, keep in mind that the maximum item size in DynamoDB is 400 KB, so you may be better off having a separate table and "joining" on it.

Related

Getting unique attributes from dynamoDB table

I am working on a backfill issue where I need to fetch all the unique values for an attribute in a dynamo db table and call a service to add these to the storage of that service. I am thinking of creating a temporary dynamo db table. I can read the original table in a lambda function and write only the unique values in the temp table. Is there any other approach possible?
The dynamo DB table has approximately 1,400,000 rows.
1,400,000 records is not that many. Probably you can just read the table.
You can improve the read by making your attribute a global secondary key. It need not be unique. Then you can read only the attribute in question or check uniqueness.
If the records in your table are constantly updated, you can listen to the DynamoDB update stream and just update your temporary table with the new values.
Using the single table pattern https://www.youtube.com/watch?v=EOQqi6Yun7g - your "temporary" table can be just a different primary key prefix.
If you have to scan the table and the process is too long, you can split it to multiple lambda calls by passing around the LastEvaluatedKey value (e.g. with a step machine).
You can scan the whole table, using projection expression fetch only the relevant columns and extract unique values.
One more approach can be, you can take a backup of DynamoDB table to S3 and then process the S3 file to extract unique column values.

Do DynamoDB secondary indexes contain actual table rows?

In the SQL world when you create a non clustered index it creates a separate data structure that allows you to find pointers to table rows based on a key that is not the primary key of the table.
From the DynamoDB docs it seems as though creating a secondary index creates a separate data structure that holds a copy of the actual table rows, not just a pointer to those rows.
Is that right?
That's partially correct - for a global secondary index, it will definitely create a second table and update that asynchronously based on the changes in the primary table. That's why you can only do eventually consistent reads on this index.
For local secondary indexes it's most likely the same table.
There is a talk from re:invent 2018, where they explain the underlying data structures, which I can highly recommend:
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)

AWS DynamoDB sorting without partition key

I have a DynamoDB table with a partition key (UUID) with a few attributes (like name, email, created date etc). Created date is one of the attribute in the item and its format is YYYY-MM-DD. But now there is a requirement change - I have to sort it based on created date and bring the entire data (that is, I cannot just bring the data on a particular partition, but the whole data from all the partitions in a sorted fashion. I know this might take time as DynamoDB to fetch data from all the partitions and sort it after. My question is:
Is the querying possible with the current design? I can see that partition key is required in the query, this is why I am confused because I cannot give a partition key here.
Is there a better way to redesign the table for such a use case?
Thanks in advance.
As your table exists you couldn't change the structure now, and even if you wanted to you would be reliant on UUID as your partition key.
There is functionality however to create a global secondary index for your DynamoDB table.
By using a GSI you can rearrange your data representation to include the creation date as the partition key of your table instead.
The reason why partition keys are important is that data in DynamoDB data is distributed across multiple nodes, with each partition sharing the same node. By performing a query it is more efficient to only communicate with one partition, as there is no need to wait on the other partitions returning results.

DynamoDB dynamic schema

I'd like to use AWS DynamoDB as a datastore for a data-collection application, where the data schema may vary over time.
For example, initially an Item may represent attributes of people e.g. {name, age}. However, later the schema may be modified to contain {name, age, gender}.
Each schema modification will be tracked and versioned and older data won't need to be migrated - but it may still need to be queried alongside newer data.
Is it an acceptable pattern to store each data-schema change in its own table? Is there a straightforward mechanism to query aggregated data across tables?
Schemas for DynamoDB tables are dynamic in nature. The only thing that needs to be set up upfront is the key name and type. You can add global indexes any time too (indexes with a different partition key). Local indexes, however, those with the same partition key but different sort key, they are added at table creation table. Because of this dynamic schema, you can add new fields, or stop adding them any time.
You need to design tables knowing how would you query them. Queries are quite restricted, you can filter but that's not a fast/cheap operation. Fast queries rely on existing indexes. Queries can fetch from a single table. Joins/unions aren't available.
A table scan is done without any criteria, only filters are available. With filters, data is fetched from disk but can be removed from the returned set. It's an expensive operation in both cost and time. Queries passing a key are faster because they fetch data from a single partition. So you might want to design a key with both a partition (userId for instance) and sort key (item id). It is usual to have compound keys on DynamoDB.
Also it is important to avoid hot spots inside a table. That is, data needs to be fairly distributed inside partition keys.
Reference: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html

How to query a Dynamo DB table without knowing the table name before runtime?

I want to query a Dynamo DB table based on an attribute UpdateTime such that I get the records which are updated in the last 24 hours. But this attribute is not an index in the table. I understand that I need to make this column as an index. But I do not know how do I write a query expression for this.
I saw this question but the problem is I do not know the table name on which I want to query before runtime.
To find out the table names in your DynamoDB instance, you can use the "ListTables" API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ListTables.html.
Another way to view tables and their data is via the DynamoDB Console: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html.
Once you know the table name, you can either create an index with the UpdateTime attribute as a key or scan the whole table to get the results you want. Keep in mind that scanning a table is a costly operation.
Alternatively you can create a DynamoDB Stream that captures all of the changes to your tables: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html.