Scan a dynamodb based on a list - amazon-web-services

I have a String Set attribute i.e SS in a dynamodb table. I need to scan the database to check the value present in the any one list of the items.
Which comparison operator should I use for this scan?
example the db has items like this:
name
[email1, email2]
phone
I need to search for a items containing a particular email say email1 alone not giving the entire tuple.

It seems like you are looking for the CONTAINS operator of Scan operation. It basically is the equivalent of in in Python.
This said, if you need to perform this often, you probably should de-normalize your data to make it faster.
For example, you could build a second table like this:
hash_key: name
range_key: email
Of course, you would have to maintain this index yourself and query it manually.

Related

DynamoDB Query distinct attribute values

I'm trying to query DynamoDB and get a result similar to select distinct(address) from ... in SQL.
I know DynamoDB is a document-oriented DB and maybe I need to change the data structure.
I'm trying to avoid getting all the data first and filtering later.
My data looks like this:
Attribute
Datatype
ID
String
Var1
Map
VarN
Map
Address
String
So I want to get the distinct addresses in the entire table.
How it's the best way to do it?
Unfortunately, no. You'll need to Scan the entire table (you can use the ProjectionExpression or AttributesToGet options to ask just for the "Address" attribute, but anyway you'll pay for scanning the entire contents of the table).
If you need to do this scan often, you can add a secondary-index which projects only the keys and the "Address" attribute, to make it cheaper to scan. But unfortunately, using a GSI whose partition key is the "Address" does not give you an ability to eliminate duplicates: Each partition will still contain a list of duplicate items, and unfortunately there is no way to just listing the different partition keys in an index - Scaning the index will give you the same partition key multiple times, as many items there are in this partition.

DynamoDB query by 3 fields

Hi I am struggling to construct my schema with three search fields.
So the two main queries I will use is:
Get all files from a user within a specific folder ordered by date.
Get all files from a user ordered by date.
Maybe there will be a additional query where I want:
All files from a user within a folder orderd by date and itemType == X
All files from a user orderd by date and itemType == X
So as of that the userID has to be the primaryKey.
But what should I use as my sortKey?. I tried to use a composite sortKey like: FOLDER${folderID}#FILE{itemID}#TIME{$timestamp} As I don't know the itemID I can't use the beginsWith expression right ?
What I could do is filter by beginsWith: folderID but then descending sort by date would not work.
Or should I move away from dynamoDB to a relationalDB with those query requirements in mind?
DynamoDB data modeling can be tough at first, but it sounds like you're off to a good start!
When you find yourself requiring an ID and sorting by time, you should know about KSUIDs. KSUID's are unique IDs that can be lexicographically sorted by time. That means that you can sort KSUIDs and they will order by creation time. This is super useful in DynamoDB. Let's check out an example.
When modeling the one-to-many relationship between Users and Folders, you might do something like this:
In this example, User with ID 1 has three folders with IDs 1, 2, and 3. But how do we sort by time? Let's see what this same table looks like with KSUIDs for the Folder ID.
In this example, I replaced the plain ol' ID with a KSUID. Not only does this give me a unique identifier, but it also ensures my Folder items are sorted by creation date. Pretty neat!
There are several solutions to filtering by itemType, but I'd probably start with a global secondary index with a partition key of USER#user_id#itemType and FOLDER#folder_id as the sort key. Your base table would then look like this
and your index would look like this
This index allows you to fetch all items or a specific folder for a given user and itemType.
These examples might not perfectly match your access patterns, but I hope they can get your data modeling process un-stuck! I don't see any reason why your access patterns can't be implemented in DynamoDB.
if you are sure about using dynamoDB you should analyze access patterns to this table in advance and chose part key, sort key based on the most frequent pattern. For other patterns, you should add GSI for each pattern. See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Usually, if it is about unknown patterns RDBMS looks better, or for HighLoad systems NO_SQL for highload workloads and periodic uploading data to something like AWS RedShift.

Is using json as sort key/partition key value good practice in DynamoDB?

Trying to define a schema for a DynamoDB table. More than two values decide a row.
A potential solution to put these key values is to have the sort key contain more than one value. As it's specified here.
Inspired by this approach, I'm thinking instead of using simple delimiter to concatenate values together, using JSON or any other string representation of objects(e.g.: String translated by Jackson) as the value of the sort key should be able to achieve similar goal and easy to convert.
However, my concern is by doing so - adding the length of the sort key - will it decrease the performance of DynamoDB? Is it a fine to use complicated string as the sort key?
TL;DR: For your Sort Key, you can use any string (within the byte limits) that distinguishes your records within the primary key. But if you are clever about it, you can make better use of it for sorting and filtering.
There are limits to the key lengths:
Partition Key: 1 to 2048 bytes
Sort Key: 1 to 1024 bytes
I am not aware of any significant performance differences based on the length of your primary and sort keys. I'm sure that ensuring performance is part of the reason for AWS to set these particular limits.
Technically, you should be fine to use any string as your key, including JSON. However, depending on how you intend to query your table, you may want to consider a more clever arrangement for your Sort Key.
For example, if your sort key contains First and Last names, you might end up with JSON like these:
{"LastName":"Doe","FirstName":"John"}
{"FirstName":"Jane","LastName":"Doe"}
JSON alone doesn't care about the ordering of the name fields, so if you don't put additional constraints on your JSON, you might make it difficult to query all records with LastName "Doe".
The documentation you linked hints at an example of a pattern you might follow for your sort key:
LASTNAME#Doe#FIRSTNAME#John
LASTNAME#Doe#FIRSTNAME#Jane
Now you can easily query for all records with last name Doe with the startsWith condition "LASTNAME#Doe#FIRSTNAME#". Your records will also naturally be sorted by Last Name, First Name.
Rather than having to parse out that string when you want to find a record's first and last names, you could just duplicate the content in the record by adding separate fields for "FirstName" and "LastName" for convenience.
So your full record might look something like this:
{
"PK":"some-primary-key",
"SK":"LASTNAME#Doe#FIRSTNAME#John",
"FirstName":"John",
"LastName":"Doe"
}

Querying DynamoDB with a partition key and list of specific sort keys

I have a DyanmoDB table that for the sake of this question looks like this:
id (String partition key)
origin (String sort key)
I want to query the table for a subset of origins under a specific id.
From my understanding, the only operator DynamoDB allows on sort keys in a Query are 'between', 'begins_with', '=', '<=' and '>='.
The problem is that my query needs a form of 'CONTAINS' because the 'origins' list is not necessarily ordered (for a between operator).
If this was SQL it would be something like:
SELECT * from Table where id={id} AND origin IN {origin_list}
My exact question is: What do I need to do to achieve this functionality in the most efficient way? should I change my table structure? maybe add a GSI? Open to suggestions.
I am aware that this can be achieved with a Scan operation but I want to have an efficient query. Same goes for BatchGetItem, I would rather avoid that functionality unless absolutely necessary.
Thanks
This is a case for using Filter Expressions for Query. It has IN operator
Comparison Operator
a IN (b, c, d) — true if a is equal to any value in the list — for
example, any of b, c or d. The list can contain up to 100 values,
separated by commas.
However, you cannot use condition expressions on key attributes.
Filter Expressions for Query
A filter expression cannot contain partition key or sort key
attributes. You need to specify those attributes in the key condition
expression, not the filter expression.
So, what you could do is to use origin not as a sort key (or duplicate it with another attribute) to filter it after the query. Of course filter first reads all the items has that 'id' and filters later which consumes read capacity and less efficient but there is no other way to query that otherwise. Depending on your item sizes and query frequency and estimated number of returned items BatchGetItem could be a better choice.

DynamoDB fast search on complex data types

I need to create a new table on AWS DynamoDB that will have a structure like the following:
{
"email" : String (key),
... : ...,
"someStuff" : SomeType,
... : ...,
"listOfIDs" : Array<String>
}
This table contains users' data and a list of strings that I'll often query (see listOfIDs).
Since I don't want to scan the table every time in order to get the user linked to that specific ID due to its slowness, and I cannot create an index since it's an Array and not a "simple" type, how could I improve the structure of my table? Should I use a different table where I have all my IDs and the users linked to them in a "flat" structure? Is there any other way?
Thank you all!
Perhaps another table that looks like:
ID string / hash key,
Email string / range key,
Any other attributes you may want to access
The unique combination of ID and email will allow you to search on the "List of IDs". You may want to include other attributes within this table to save you from needing to perform another query.
Should I use a different table where I have all my IDs and the users linked to them in a "flat" structure?
I think this is going to be your best bet if you want to leverage DynamoDB's parallelism for query performance.
Another option might be using a CONTAINS expression in a query if your listOfIDs is stored as a set, but I can't imagine that will scale performance-wise as your table grows.