DynamoDB Friends Query - amazon-web-services

I have DynamoDB table with users and friends.Schema looks like blow. Here user 1 (tom) and user 2 (rob) are friends.
+--------+---------+----------+
| PK | SK | UserName |
+--------+---------+----------+
| USER#1 | USER#1 | tom |
| USER#2 | USER#2 | bob |
| USER#3 | USER#3 | rob |
| FRD#1 | USER#2 | |
| FRD#2 | USER#1 | |
+--------+---------+----------+
Is it possible to get name of friends of user 1 (tom) in single query?
If not what is efficient way to query.
Any help would be really appreciated.
What I am doing currently is:
Step 1: Get all friends of user 1.
let frdParams = {
TableName : "TABLE_NAME",
IndexName: "SK-PK-index",
KeyConditionExpression: "SK = :userId AND begins_with(PK, :friend)",
ExpressionAttributeValues: {
":userId": {S: userId},
":friend": {S: "FRIEND#"}
}
};
const frdRes = await ddb.query(frdParams).promise();
Step 2: Once I getting all users, running more queries in loop.
for (const record of frdRes.Items) {
let recordX = aws.DynamoDB.Converter.unmarshall(record);
let friendId = itemX.PK.replace("FRD", "USER")
let userParams = {
TableName : "TABLE_NAME",
KeyConditionExpression: "PK = :userId AND SK = :userId",
ExpressionAttributeValues: {
":userId": {S: friendId}
}
};
const userRes = await ddb.query(userParams).promise();
}

Data modeling in DynamoDB requires a different mindset than one might use when working with SQL databases. To get the most out of DynamoDB, you need to consider your applications access patterns and store your data in a way that supports those use cases.
It sounds like your access pattern is "fetch friends by user id". There are many ways to implement this access pattern, but I'll give you a few ideas of how it might be achieved.
Idea 1: Denormalize Your Data
You could create a list attribute and store each users friends list. This would make fetching friends by user super simple!
As with any access pattern, there are limitations with this approach. DynamoDB attributes have a maximum size of 400KB, so you'd be limited to a friends list of that size. Also, you will not be able to perform queries based on the values of this attribute, so it would not support additional access patterns. But, it's super simple!
Idea 2: Build an item collection, storing friends within the USER#<id> partition.
This is a typical pattern to represent one-to-many relationships in DynamoDB. Let's say you define friendships with a PK of USER#<user_id> and an SK of FRIEND#<friend_id>. Your table would look like this:
You could fetch the friends of a given user by searching the users partition key for Sort Keys that begins_with FRIEND.
These are just two ideas, and there are many more (and likely better) ways to model friendships in DynamoDB. The examples I've given treat the relationship as one-to-many (one user has many friends). What's more likely is that you'd have a many-to-many relationship to model, which can be tricky in DynamoDB (and another topic altogether!)
If many-to-many sounds like what you have, AWS has an article describing modeling many-to-many relationships that may prove a good starting point.

Related

How to design a DynamoDB table schema

I am doing my best to understand DynamoDB data modeling but I am struggling. I am looking for some help to build off what I have now. I feel like I have fairly simple data but it's not coming to me on what I should do to fit into DynamoDB.
I have two different types of data. I have a game object and a team stats object. A Game represents all of the data about the game that week and team stats represents all of the stats about a given team per week.
A timeId is in the format of year-week (ex. 2020-9)
My Access patterns are
1) Retrieve all games per timeId
2) Retrieve all games per timeId and by TeamName
3) Retrieve all games per timeId and if value = true
4) Retrieve all teamStats per timeId
5) Retrieve all teamStats by timeId and TeamName
My attempt at modeling so far is:
PK: TeamName
SK: TimeId
This is leading me to have 2 copies of games since there is a copy for each team. It is also only allowing me to scan for all teamStats by TimeId. Would something like a GSI help here? Ive thought maybe changing the PK to something like
PK: GA-${gameId} / TS-${teamId}
SK: TimeId
Im just very confused and the docs aren't helping me much.
Looking at your access patterns, this is a possible table design. I'm not sure if it's going to really work with your TimeId, especially for the Local Secondary Index (see note below), but I hope it's a good starting point for you.
# Table
-----------------------------------------------------------
pk | sk | value | other attributes
-----------------------------------------------------------
TimeId | GAME#TEAM{teamname} | true | ...
TimeId | STATS#TEAM{teamname} | | ...
GameId | GAME | | general game data (*)
TeamName | TEAM | | general team data (*)
# Local Secondary Index
-------------------------------------------------------------------------------
pk from Table as pk | value from Table as sk | sk from Table + other attributes
-------------------------------------------------------------------------------
TimeId | true | GAME#Team{teamname} | ...
With this Table and Local Secondary Index you can satisfy all access patterns with the following queries:
Retrieve all games by timeId:
Query Table with pk: {timeId}
Retrieve all games per timeId and by TeamName
Query table with pk: {timeId}, sk: GAME#TEAM{teamname}
Retrieve all games per timeId and if value = true
Query LSI with pk: {timeId}, sk: true
Retrieve all teamStats per timeId
Query table with pk: {timeId}, sk: begins with 'STATS'
Retrieve all teamStats by timeId and TeamName
Query table with pk: {timeId}, sk: STATS#TEAM{teamname}
*: I've also added the following two items, as I assume that there are cases where you want to retrieve general information about a specific game or team as well. This is just an assumption based on my experience and might be unnecessary in your case:
Retrieve general game information
Query table with pk: {GameId}
Retrieve general team information
Query table with pk: {TeamName}
Note: I don't know what value = true stands for, but for the secondary index to work in my model, you need to make sure that each combination of pk = TimeId and value = true is unique.
To learn more about single-table design on DynamoDB, please read Alex DeBrie's excellent article The What, Why, and When of Single-Table Design with DynamoDB.

Handling more many to many relationships in dynamodb single table

I am designing a single dynamodb table for an institute. It has entities like institute, students, subjects and teachers. The relationships between them are like this.
institute - students many to many
institute - teachers many to many
institute - subjects many to many
I selected institute id as PK and location as SK. But there are more many to many relationships in this scenario. So how to handle this kind of situation in was dynamodb?
Thanks in advance.
DynamoDB is a noSQL database where you should denormalize and duplicate your data as much as possible. You are thinking in SQL and are trying to normalize your data, you need to switch your mindset.
Create an item for each institute, student, teacher and subject.
Create a students map or list (I prefer a map with an ID as the key) inside institute, and a institutes map or list inside students. Those maps or lists are made of copies of your original items:
inst-0 | USA | Institute name | { stud-0: { name: John }, studo-1: { name: Matt } }
-----------------------------------------------------------------------------------------------------------------
stud-0 | John | { inst-0: { name: Institute name } } |
-----------------------------------------------------------------------------------------------------------------
stud-1 | Matt | { inst-0: { name: Institute name } } |
The downside here is that you need to update each copy when the original changes. But it is generally not a problem unless your data changes very frequently.
If your copied data changes frequently, you can create an intermediate relationnal item, but then maybe a relationnal database is more appropriate for your project.
You could also create an item for each student and just store a reference to those student into institute items. But here again you are not denormalizing your data, but it could be a viable solution if your data changes frequently or if you have a big amount of student in each institute.

Efficient way of joining two query sets without foreign key

I know django doesn't allow joining without a foreign key relation and I can't specify a foreign key because there are entries in one table that are not in the other (populated using pyspark). I need an efficient way to query the following:
Let's say I have the following tables:
Company | Product | Total # Users | Total # Unique Users
and
Company | Product | # Licenses | # Estimated Users
I would like to join such that I can display a table like this on the frontend
Company View
Product|Total # Users|Total # Unique Users|#Licenses|# Estimated Users|
P1 | Num | Num | Num | Num |
P2 | Num | Num | Num | Num |
Currently loop through each product and perform a query (way too slow and inefficient) to populate a dictionary of lists
Way too inefficient
I'm not quite getting why you can't do a Foreign key in this situation, but if you can implement your query in a sql statement I would look at Q objects. See "Complex Lookups with Q Objects" in the documentation.
https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

Retrieving arrays of nested information in AppSync schema

I have worked out a fairly complex chain of DynamoDB resolvers on a GraphQL AppSync query. What I am curious to know is if I could have possibly designed this in a way to require fewer DynamoDB queries.
Here is my GraphQL Schema:
type Tag {
PartitionKey: ID!
SortKey: ID!
TagName: String!
TagType: String
}
type Model {
PartitionKey: ID!
Name: String
Version: Int
FBX: String
# ms since epoch
CreatedAt: AWSTimestamp
Description: String
Tags: [String]
}
type Query {
GetAllModels(count: Int, nextToken: String): PaginatedModels!
}
This is the query that I am doing:
query GetAllModels{
GetAllModels {
Models {
PartitionKey
Name
Version
CreatedAt
Description
Tags {
TagName
TagType
}
}
}
}
My DynamoDB table is set up as so:
PartionKey | SortKey | TagName | TagType | ModelName | Description
Model-0 | Model-0 | ModelZero | Blah Blah
Model-0 | Tag-Pine |
Model-0 | Tag-Apple |
Tag-Pine | Tag-Pine | Pine | Tree
Tag-Apple | Tag-Apple | Apple | Fruit
So in my resolvers I am going:
GetAllModels will scan with two filters. One filter for PartitionKey beginning with 'Model-' and another filter for SortKey begining with 'Model-'. This is to get all Models.
Next there is a resolver attached to 'Tags' in the Model object. This will query with two expressions. One for PartitionKey = source.Parition and a second for SortKey begin_with 'Tag-' this gets me all of the tags on a model.
Next there are two resolvers on the Tag object. One on TagName and another on TagType. These do a direct GetItem to get their appropriate value with PartitionKey = source.Sort and SortKey = source.SortKey set as the keys.
So each scanned Model ends up firing off 3 more queries to DynamoDB. This just seems a bit excessive to me. But I cannot see any other way to do this. Is there some way to be able to get both TagName and TagType in one query?
Is there a better way to approach this?
I see a few things that I would personally change. The first is that I would avoid the nested DynamoDB scan operations. At least one of these can be replaced with a much faster query operation. The second is that I would consider rethinking how you are storing the data. Currently, there is no good way to list model objects.
Why is there no good way to list model objects?
Assuming each model object will have multiple tags then you are going to have a table that is sparsely populated by model objects. i.e. out of 100 rows you may have 20 - 50 models depending on how many tags the average model has. In DynamoDB, a table is split up based on the partition key causing rows that share the same partition key to be stored near each other to speed up query operations. With your setup where the Partition Key is essentially the unique id of a single model object this means that we can easily get a single model object. You can also quickly get the tags for a single object since those records are nearby as well.
The issue.
The DynamoDB scan operation looks at each partition one at a time, reads as many records as the requests limit allows or all of them if the limit is sufficiently large, and then, only after reading the records from the individual partitions, applies the filter expression before returning the final result. This means you may ask for the first 10 models but since the limit is applied before the scan filter, you may very well only get back 1 model (if that one model had 9 or more tags which would exhaust the limit while DynamoDB was reading the first partition). This may seem strange when coming from many different database systems and is an important consideration of its design.
Here are two solutions to address this concern:
1. Store Models in one table and Tags in another.
NoSQL databases like DynamoDB allow you to store many types of data in the same table but there is nothing wrong with splitting them out. Traditionally it can be a pain to work with multiple tables in a NoSQL database that lacks a join operation or something similar, but fortunately for us we can use GraphQL to "join" data for us. With the approach, the Model table has a single partition key named "id" and your GetAllModels resolver is still a scan but this time on the model table. This way the table is not sparse and you will get 10 models when you ask for 10 models. The Tag table should have a partition key of modelId and a sort key of tagId. You would then have a resolver on the Model.tags field that does a query against the Tag table and looks for rows with the modelId == $ctx.source.id.
This is essentially how #model and #connection work in the new graphql transform tooling launched as part of the amplify cli. You can see more here although the docs are as of writing still being improved. https://aws-amplify.github.io/amplify-js/media/api_guide
2. Store Models and Tags in the same table but change the key structure.
This approach works if you can reliably say that you will have less than 10GB of data per data type (e.g. Model & Tag). For this approach you have a single table with a PartitionKey of Type and Sort Key of id. When you create objects you create them with a Type e.g "Tag" or "Model" etc and a unique id (like a uuid). To list objects of the same type you do a DynamoDB query operation on the partition key of the type to list e.g. "Tag" or "Model". You can then use GSIs to efficiently look up related objects. In your case you would store a "modelId" is every Tag object. You would then make a GSI using the "modelId" as the Partition Key. To list all the tags for a given model you could then do a DynamoDB query operation against that GSI.
I'm sure there are many more ways to do this but hopefully this helps point in the right direction.

Ordering entries via comment count with django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.
Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.