Dynamo DB - Table Design for Lists

Dynamo DB - Table Design for Lists - list

I am new to Dynamo DB.
I have a table User to store user info and user's music playlist.
Music playlist contains a number of mp3 urls.
How do I store it in Dynamo DB for efficiency.
My User JSON Design:
Key is userid
{
"userid":"",
"shuffle":"",
"folderid":"",
"lastsong:"",
"playlist": [ { "url1", "url2","url3"......100s of urls }]
}
Thanks,
Gagan

To begin with, I would divide the playlist table into a separate table. In this type of database it is better to have more information than to penalize the speed of reading when making a query or a scan.
If your playlist info is only made of Urls maybe in this way it would be fine, as you have shown it, beause you can treat it as a List in any programming lenguage, but if in a future you have to include more info... it will be more complex
The hash key of the playlist table ,if you need in a future to add more info, should be the userid.

Related

Best way to create a personal feed in DynamoDB?

I'm building a social network app with the usual user-follower-relationship. My DynamoDB table is structured like this:
Item
PK
SK
GSI1PK
GSI1SK
GSI2PK
GSI2SK
User:
USER#id
#META#id
Follow:
USER#id
FOLLOW#id
FOLLOW#id
USER#id
Post:
USER#id
POST#ulid
POST#ulid
#META#id
post[0-9]
POST#ulid
I now want to create a feed for the user's front page.
The feed should include the 10 latests posts of the users the user is following.
My approach currently is to first query for all FOLLOWs of the user and then loop through all the user items and get their posts. But this approach could lead to a lot of read actions when the follower lists grow larger. Imagine a user having thousands of followed users.
Is there maybe a more efficient approach?

Elastic search vs Dynamodb for Filtering

I am building a service which would have millions of rows of data in it. We wanted to have good search on it. Eg. we can search by some field values. The structure of the row will be like as follows:
{
"field1" : "value1",
"field2" : "value2",
"field3" : {
"field4": "value4",
"field5": "value5"
}
}
Also, the structure of field3 can be changing with field4 present sometime and sometime not.
We wanted to have filters on following fields field1, field2 and field 4. We can create indexes in dynamodb to do that. But I am not sure if we can create index on field4 in dynamodb easily without flattening the json.
Now, my question is, should we use elastic search datastore for it, which as far as I know, will create indexes on every field in the document and then one can search on every field? Is that right? Or should we use dynamodb or completely any other data store?
Please provide some suggestions.

If search is a key requirement for your application, then use a search product - not a database. Dynamodb is great for a lot of things, but adhoc search is not one of them - you are going to end up running lots of very expensive (slow) scans if you go with dynamodb; this is what ES was built for.

I've a decent working experience with dynamoDB and extensive working experience with Elasticsearch(ES).
Let's first understand the key difference between these two:
dynamoDB is
Amazon DynamoDB is a key-value and document database
while Elasticsearch
Elasticsearch is a distributed, open source search and analytics
engine for all types of data, including textual, numerical,
geospatial, structured, and unstructured data.
Now coming to question, let's discuss how these system works internally and how it affects the performance.
DynamoDB is great to fetch the documents based on keys but not great for filtering and searching, as in relations database for improving performance of these oprations you create index on the columns, in similar way you have to create an index in dynamoDB as its a database, not search engine. And creating index on fields on the fly is pain and its not cached in DynamoDB.
Elasticsearch stores data differently by creating the inverted index for all indexed fields(default as mentioned by OP) and filtering on these fields are super fast if you use the filter context which is the same use case here, more info with example is explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context, Also as these filters are not used for score calculation and cached at elasticsearch so their performance(both read and write) is super fast as compared to dynamoDB and you can benchmark that as well.

AWS Dynamo Smart Home Schema (Locations/Rooms)

I am trying to build out a scalable smart home infrastructure on AWS using iot core, lambda, and dynamodb along with the serverless framework and subsequent Android/iOS app.
I am implementing locations and rooms in dynamodb. A user can have many locations, and locations can have many rooms. I am used to using Firebase Firestore, so the use of partition keys and sort keys (hash and range?) and the combination to query are a little confusing. I implemented my own hash to use as a primary (partition? hash?) id. Here is the structure I am thinking of:
Location
id
name
username
I also added a secondary index on username, so that a user could query all of their locations.
Room
id
name
locationId
I also added a secondary index on locationId, so that a user could query all rooms for a given location
Here is the code in which I create the id's:
// need a unique hash for the id
let hash = event.name + event.username + new Date().getTime();
let id = crypto.createHash('md5').update(hash).digest('hex');
let location = {
id: id,
name: event.name,
username: event.username
};
And for rooms:
// need a unique hash for the id
let hash = event.name + event.locationId + new Date().getTime();
Since I'm fairly new to Dynamo/AWS, I'm wondering if this is an acceptable solution. Obviously I would expand on this by adding multiple devices under rooms by associating via the roomId. I would also like to be able to share devices, so I'm not quite sure how that would work, as the association for a user is on location - so I assume I would have to share location, room(s), and device(s) (which I think is how Google Home does it)
Any suggestions would be greatly appreciated!
EDIT
The queries that I can think of would be:
Get Location by Id
Get all Locations by User
Get Room by Id
Get all Rooms by Location
However as the app expands in the future, I would want these queries to be flexible (share location, get shared locations, etc)

I would want these queries to be flexible
Then noSQL in general and Dynamo specifically may not be the right choice.
As #varnit alludes to, noSQL DB's are very flexible in what you store, but very inflexible in how you can query that data.
Dynamo for instance can only return a list (Query) if you use a sort key (SK) or if you do a full table scan (not recommended). Otherwise, it can only return a single record.
I don't understand what a "shared location" would entail.
But with multiple tenets in Dynamo, (each user is only looking at their data) the easy solution would be to use userID as the partition key (PK).
I'd use a composite sort key of location#room
Get Location by Id --> GetItem(PK = User, SK = location)
Get all Locations by User --> Query (PK = User)
Get all rooms by Location --> Query (PK = User, SK starts with Location)
This one is a little trickier...
- Get Room by Id -->
If you really need to get a room without having the location, then you'd want to have room as stand-a-lone attribute in addition to having it as part of the sort key. Then you can create a local secondary index over it and query (PK = User, Index SK = Room)
I suspect that finding a room via GetItem(PK = User, SK = location#room) might work for you instead.
Key point, the partition key comparision is always equal. There's no start with, ends with or contains for the partition key comparison.
If you haven't seen them, take a look at the following videos
AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301)
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
Also be sure to read the SaaS Storage Strategies - Building a Multitenant Storage Model on AWS whitepaper.
EDIT
"location" and "room" can be whatever makes the most sense to your application. GUID or a natural key such as "Home". In a noSQL db, GUIDs are useful when multiple nodes are adding records. But a natural key is good when that what the application user will have handy. Since you don't want to have to look up a guid by the natural key. RDBMS practices don't apply to noSQL DBs.
So yes, I'd use "Home" as the location, meaning the user won't be able to have multiple "Home"s. But I don't see that as a big deal, I'd use "Home" and "Vacation House" in real life.
EDIT2
Dynamo doesn't care if it's a GUID or a natural key. It internally hashes the whatever value you use for partition key. All that matters is the number of distinct values. Distinct is distinct, doesn't matter if the value is '0ae4ad25-5551-46a7-8e39-64619645bd58' or 'charles.wilt#mydomain.com'. If your authorization process returns a GUID, use that. Otherwise use the username.

DynamoDB table/index schema design for querying multi-valued attributes

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo#foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo#foo.com",
... other attributes ...
}
If the user adds a second, third, etc. email, then add a new item for each email, like this:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar#bar.com"
// no more attributes
}
The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able
Is this a good way to do it? Is there a better way?

Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Using sort-key on email-id will result in creating duplicate records for the same user i.e. if a user has registered 5 email, that implies 5 records in your table with the same schema and attribute except email-id attribute.
What if a new use-case comes in the future, where now you also want to search for a user based on some other attribute(for example cell phone number, assuming a user may have more then one cell phone number)
DynamoDB has a hard limit of the number of secondary indexes you can create for a table i.e. 5.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1. DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Store the user records in DynamoDB table(let's call it UserTable)as and when a user registers.
Enable DynamoDB table streams on UserTable table.
Build an AWS Lambda function that reads from the table's stream and persists the records in AWS Elasticsearch.
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2. Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database.
Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
DynamoDB will provide you durable, highly available, low latency primary storage for your application.
AWS Elasticsearch will act as secondary storage, which is also durable, scalable and low latency storage.
With AWS Elasticsearch, you can run any search query on your table. You can also do analytics on data. Kibana UI is provided out of the box, that you may use to plot the analytical data on a dashboard like (how user growth is trending, how many users belong to a specific location, user distribution based on city/state/country etc)
With DynamoDB streams and AWS Lambda, you will be syncing these two databases in near real-time [within few milliseconds]
Your application will be scalable and the search feature can further be enhanced to do filtering on multi-level attributes. [One such example: search all users who belong to a given city]
Having said that, now I will leave this up to you to decide. 😊

Data Model in DynamoDB

When using Mobile Hub (AWS), building a DynamoDB table. There is at some point the option to download the Data Model for the table. But we do not see this option (AFAIK) if we do not use Mobile Hub. So the question is: Is there a way to get the Data Model for the table, when not using Mobile Hub?

Just to clarify, DynamoDB doesn't have a full data model like RDBMS. However, it does have the hash key, partition key (if defined) and all the index details.
You can get this information using Describe table API. The API will give the output in JSON format. Kindly look at the link for more information.
Please note that all the non-key attributes are not included in the data model. This is the basic concept in NoSQL database and this is the flexibility of NoSQL database when compared to RDBMS.
The item structure (non-key attributes) need not be defined while
creating the table. In fact, DynamoDB doesn't allow to define the
non-key attributes while creating the table
The non-key attributes in one item need not be same in the another
item

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js