GSI vs redundancy dynamoDB - amazon-web-services

GSI vs redundancy dynamoDB - amazon-web-services

I have this scenario:
I have to save in dynamoDB table a lot of shops. Every shop has a ID string and its PK.
Every shop has a field "category" that is a string that indicates its category (food,tatoo ...).
So far everything is ok.
I have this use-case: "given in a category get all the stores of that category".
To accomplish this, two options came to my mind:
create a GSI that has like PK the "category id" and like field "shop ID".
In this way with the id of the category I get all the IDs of the stores of that category and then for each store id I query the main table to get all the info of each single store (name, address, etc.).
I create in the main table a PK called type "category_$id" (where $id is the category id) and as field the id of the store. This, as in the case of GSI, given a category ID, I have the set of IDs of the shops and then for each ID I execute the query on the same table to get all the info of that shop.
I wanted to know what the difference between these two options is in terms of cost / benefit and which is the best.
They seem to me substantially the same thing (the only difference is that the first uses another table, i.e. the index, while the second uses the same table), but I await the opinion of someone more experienced than me

One benefit of GSI is that it will result in less management. Lets say you delete/add a record from/to a main table. This will be automatically be reflected in your GSI.
In contrast, if you have two independent tables, you have to manage the synchronization between them yourself.

Related

How can I support many sortable fields in dynamodb

I have a requirement to query data but sort by different fields (probably more than 30).
I know I can build a secondary index and use different field as sort key in different GSI. However, it will exceed the maximum gsi one table can have.
Is there a pattern to restructure the data to make it sortable via a single GSI or even without GSI?
The data I need to support looks like:
Table: OrderProductUser
# Order Items:
type
createdDate
updatedDate
amount (number)
fee (number)
tax (number)
# Product Items:
type
name
price
...
# User Items:
type
firstName
lastName
dob
gender
...
...
Since Dynamodb recommends using one table, I put all different records into one. The type field in each row indicates what the row is.
But I'd like to support sort on all different fields including string, date and number. If I sort them in application, it won't support pagination very well. Is there a patten to support that?

You only need 1 GSI per table...as you can overload them
simply concatenate the attribute name to the GSI Partition or Sort key...
ex.
Partition Sort
AMOUNT 99.99
FEE 1.50
xxx AMOUNT:00099.99
xxx FEE:001.50
But you'll only be able to sort by one column at a time, and you have to write multiple records out to DDB.
Given the limitation of sorting/filtering in DDB, a standard RDS is likely a better choice for a high functioning UI.
The usual recommendation is to front DDB with ElasticSearch... and if you truly need the kind of scaling DDB+ElasticSearch can provide, then go for it.
But for most users, RDS Aurora for instance is much more cost effective.

DynamoDB record size increasing with time

I have a customer table in DynamoDB with basic attributes like name, dob, zipcode, email, etc. I want to add another attribute to it which will keep increasing with time. For example, each time the user clicks on a product (item), I want to add that to the record so that I have the full snapshot of the customer's profile in a single value indexed by the customerId. So, my new attribute would be called viewedItems and would be a list of itemIds viewed (along with the timestamp).
However, given the 4KB size limit for DynamoDB value, it is going to be surpassed with time as I keep adding the clicked products to the customer profile.
How can I best define my objects so as to perform the following?
Access the full profile of the customer by customerId, including the views.
Access time filtered profile of the customer (like all interactions since last N days), in which case the viewed items should be filtered by the given time range.
Scan the entire table with a time filter on viewedItems.
The query needs to be performant as the profile could be pulled at request time.
Ability to update individual customer record (via a batch job, for example, that updates each customer's record if need be).
One way to do this would be to create a different table (say customer_viewed_items) with hash key customerId and a range key timestamp with value being the itemId that the customer viewed. But this looks like an increasingly complicated schema - not to mention twice the cost involved in accessing the item. If I have to create another attribute based on (say) "bought" items, then I'll need to create another table. So, the solution I have in mind does not seem good to me.
Would really appreciate if you could help suggest a better schema/approach.

As soon as you really don't know how many items will be viewed by user (edge case - user opens all items sequentially, multiple times) - you cannot store this information in single dynamodb record.
The only solution is to normalize your database and create separate table like you've described.
Now, next question - how to minimize retrieval cost in such scheme? Usually you don't need to fetch all viewed items, probably you want to display some of them, then you need to fetch only last X.
You can cache such items in main table customer, ie - create field "lastXviewedItems" and updated it, so it contains only limited number of items without breaking size limit, of course for BI analysis - you will have to store them in 2nd table too.

Search dynamoDB using more than one attribute

I've created a skill with the help of a few people on this site.
I have a database and what I want to do is ask Alexa to recall data from my database. I.e. by asking for films from a certain date
The issue im having at the moment is I have defined my partition key and it works correctly for one of my items in my table and will read the message for that specific key, but anything else i search it gives me the same response as the one item that works. Any ideas on how to overcome this?
Here is how i have defined my table:
let handleCinemaIntent = (context, callback) => {
let params = {
TableName: "cinema",
Key: {
date: "2018-01-04",
}
};
Just as a side note, I will have the same date repeating in my partition key and from what I understand, the partition key needs to be unique; so i'd need to overcome this.

You have a few options for structuring your DynamoDB table but I think the most straightforward is the following:
You can set up your table with a partition key of "date" (like you have now), but also with a sort key which would be the film name, or some other identifier. This way, you can have all films for a particular date under one partition key and query them using a Query operation (as opposed to the GetItem that you've been using). You won't be able to modify the existing table to add a sort key though, so you will have to delete the existing table and recreate it with the different schema.
Since there is generally a rather limited number of films for each day, this partition scheme should work really well, assuming you always just query by day. Where this breaks down is if you need to search by just film name (ie. "give me the dates when this film will run"). If you need the latter, then you could create a GSI where the primary key is the film name, and the range key is the date.
However, you should pause a moment and consider whether DynamoDB is the right database for your needs. I say this because Dynamo is really good at access patterns where you know exactly what you are searching for and you need to be able to scale horizontally. Whereas your use case is more of a fuzzy search.
As an alternative to Dynamo you might consider setting up an ElasticSearch cluster and throwing your film data in it. Then you can very trivially run queries like "what films will run on this day", or "what days will this film run", or "what films will run this week", or "what action movies are coming this spring", "what animation films are playing today", "what movies are playing near me"

DynamoDb database design

I'm new to DynamoDb and noSql in general.
I have a users table and a notes table. A user can create notes and I want to be able to retrieve all notes associated with a user.
One solution I've thought of is every time a note is saved the note id is stored inside a 'notes' attribute inside the user table. This will allow me to query the users table for all note id's and then query notes using those id's:
UserTable:
UserId: 123456789
notes: ['note-id-1', note-id-2]
NotesTable
id: note-id-1
text: "Some note"
Is this the correct approach? The only other way I can think is to have the notes table have a userId attribute so I can then query the notes table based on that userId. Obviously this is the sort of approach is more relational.

I would take the approach at the end of your question: each note should have a userId attribute. Then create a global secondary index with userId as primary key and noteId as sort key. This way you can also query on userId, by doing a query on that index.
If you do it the way you suggested, you always need two queries to get the notes of a user (first get the notes from the user table and then query on the notes table). Also, when someone has N notes you would need to do N queries, this is going to be expensive if N is large.
If you do it the way in this answer, you need one query to get all notes of a user (I'm assuming no pagination) and one to get the user information. Will never be more than 2.
General rule of thumb:
SQL: storage = expensive, computation = cheap
NoSQL: storage = cheap, computation = expensive
So always try to need as little queries as possible.

Efficient implementation of this simple relation in DynamoDB?

User has an email address and a display name.
Both of these must be unique.
Both of these must be updatable as long as either is not being used already.
A User table will exist with additional non-key attributes and a guid ID.
How to model to support efficient query check if email address or display name is already being used?
Should I create a table with the guid as Key, no range, and 2 separate GSI one for email and one for display name (each being the key)? Both will also have a second field with the guid id of the user. Or should these be completely separate tables, or ????
Thoughts, is there a better way?
Thanks.

There are 3 ways you can design that I can think of:
As you have mentioned, a table with guid and 2 separate GSI one for email and other for Name.
You have stated that both the fields had to be unique, so potentially you can make any one of them as hash and create GSI for other.(This will run into problem as you mention that you need to update Email & Name as well, for that you have to delete old record and add a new record with same attributes and updated Hash keys)
Advantage of this would be that you need to pay less as there will be only one GSI compared to #1.
Another option is to use CloudSearch, your DynamoDB table can be integrated with cloudSearch, in this option you can simply create a table with guid no need to add any GSI, whenever you want to search you can search on CloudSearch to get the output.
One more advantage you will get in CloudSearch is that you will be able to query on any attributes of the table and can use different filters on them.
One thing you need to see it that price difference between #2 and #3, you can go with anyone which is better suited in terms of price and functionality.
If you implement this with other ways feel free to share it.
Hope that helps

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js