How to filter on NULL? - amazon-web-services

"order (S)","method (NULL)","time (L)"
"/1553695740/Bar","true","[ { ""N"" : ""1556593200"" }, { ""N"" : ""1556859600"" }]"
"/1556439461/adasd","true","[ { ""N"" : ""1556593200"" }, { ""N"" : ""1556679600"" }]"
"/1556516482/Foobar","cheque","[ { ""N"" : ""1556766000"" }]"
How do I scan or query for that matter on empty "method" attribute values? https://s.natalian.org/2019-04-29/null.mp4

Unfortunately the DynamoDB console offers a simple GUI and assumes the operations you want to perform all have the same type. When you select filters on columns of type "NULL", it only allows you to do exists or not exists. This makes sense since a column containing only NULL datatypes can either exist or not exist.
What you have here is a column that contains multiple datatypes (since NULL is a different datatype than String). There are many ways to filter what you want here but I don't believe they are available to you on the console. Here is an example on how you could filter the dataset via the AWS CLI (note: since your column is a named a reserved word method, you will need to alias it with an expression attribute name):
Using Filter expressions
$ aws dynamodb scan --table-name plocal --filter-expression '#M = :null' --expression-attribute-values '{":null":{"NULL":true}}' --expression-attribute-names '{"#M":"method"}'
An option to consider to avoid this would be to update your logic to write some of sort filler string value instead of a null or empty string when writing your data to the database (i.e. "None" or "N/A"). Then you could solely operate on Strings and search on that value instead.
DynamoDB currently does not allow String values of an empty string and will give you errors if you try and put those items directly. To make this "easier", many of the SDKs have provided mappers/converters for objects to DyanmoDB items and this usually involves converting empty strings to Null types as a way of working around the rule of no empty strings.
If you need to differentiate between null and "", you will need to write some custom logic to marshall/unmarshall empty strings to a unique string value (i.e. "__EMPTY_STRING") when they are stored in DyanmoDB.

I'm pretty sure that there is no way to filter using the console. But I'm guessing that what you really want is to use such a filter in code.
DynamoDB has a very peculiar way of storing NULLs. There is a "NULL" data type which basically represents the concept of null values but it really is sort of like a boolean.
If you have the opportunity to change the data type of that attribute to be a string, or numeric, I strongly recommend doing so. Then you'll be able to create much more powerful queries with filter conditions to match what you want.
If the data already exists and you don't have a significant number of items that need to be updated, I recommend creating a new attribute to represent your data and backfilling.
Just following up on the comments. If you prefer using the mapper, you can customize how it marshals certain attributes that may be null/empty. Have a look at the go sdk encoder implementation for some examples: https://git.codingcafe.org/Mirrors/aws/aws-sdk-go/blob/9b5aaeba7a51edcf3f87bda525a08b04b90d2ef8/service/dynamodb/dynamodbattribute/encode.go

I was able to do this inside a FilterExpression:
attribute_type(MyProperty, :nullType) - Where :nullType is a string with value NULL. This one finds null entries.
attribute_type(MyProperty, :stringType) - Where :stringType is a string with value S. This one finds non-null entries.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html#Expressions.OperatorsAndFunctions.Syntax

Related

Compare values in dynamodb during query without knowing values in ExpressionAttributeValues

Is it possible to apply a filter based on values inside a dynamodb database?
Let's say the database contains an object info within a table:
info: {
toDo: x,
done: y,
}
Using the ExpressionAttributeValues, is it possible to check whether the info.toDo = info.done and apply a filter on it without knowing the current values of info.toDo and info.done ?
At the moment I tried using ExpressionAttributeNames so it contains:
'#toDo': info.toDo, '#done': info.done'
and the filter FilterExpression is
#toDo = #done
but I'm retrieving no items doing a query with this filter.
Thanks a lot!
DynamoDB is not designed to perform arbitrary queries as you might be used to in a relational database. It is designed for fast lookups based on keys.
Therefore, if you can add an index allowing you to access the records you look for, you can use it for this new access pattern. For example, if you add an index that uses info.toDo as the partition key and info.done as the sort key. You can then use the index to scan the records with the conditional expression of PK=x and SK=x, assuming that the list of possible values is limited and known.

Is using json as sort key/partition key value good practice in DynamoDB?

Trying to define a schema for a DynamoDB table. More than two values decide a row.
A potential solution to put these key values is to have the sort key contain more than one value. As it's specified here.
Inspired by this approach, I'm thinking instead of using simple delimiter to concatenate values together, using JSON or any other string representation of objects(e.g.: String translated by Jackson) as the value of the sort key should be able to achieve similar goal and easy to convert.
However, my concern is by doing so - adding the length of the sort key - will it decrease the performance of DynamoDB? Is it a fine to use complicated string as the sort key?
TL;DR: For your Sort Key, you can use any string (within the byte limits) that distinguishes your records within the primary key. But if you are clever about it, you can make better use of it for sorting and filtering.
There are limits to the key lengths:
Partition Key: 1 to 2048 bytes
Sort Key: 1 to 1024 bytes
I am not aware of any significant performance differences based on the length of your primary and sort keys. I'm sure that ensuring performance is part of the reason for AWS to set these particular limits.
Technically, you should be fine to use any string as your key, including JSON. However, depending on how you intend to query your table, you may want to consider a more clever arrangement for your Sort Key.
For example, if your sort key contains First and Last names, you might end up with JSON like these:
{"LastName":"Doe","FirstName":"John"}
{"FirstName":"Jane","LastName":"Doe"}
JSON alone doesn't care about the ordering of the name fields, so if you don't put additional constraints on your JSON, you might make it difficult to query all records with LastName "Doe".
The documentation you linked hints at an example of a pattern you might follow for your sort key:
LASTNAME#Doe#FIRSTNAME#John
LASTNAME#Doe#FIRSTNAME#Jane
Now you can easily query for all records with last name Doe with the startsWith condition "LASTNAME#Doe#FIRSTNAME#". Your records will also naturally be sorted by Last Name, First Name.
Rather than having to parse out that string when you want to find a record's first and last names, you could just duplicate the content in the record by adding separate fields for "FirstName" and "LastName" for convenience.
So your full record might look something like this:
{
"PK":"some-primary-key",
"SK":"LASTNAME#Doe#FIRSTNAME#John",
"FirstName":"John",
"LastName":"Doe"
}

Random Character in map in DynamoDB

When I update some record in DynamoDB as such
UpdateExpression: "set #audioField = :payload",
ExpressionAttributeValues: {
":payload": something,
},
var something = {"test.com1": {}}
DynamoDB puts a random character in the record like this
{ "test.com1" : { "M" : { } }}
What's up with this? And how do I prevent this?
This is not a random character, this is how DynamoDB stores and represents types.
DynamoDB embeds type information in each value that is stores. See the following for the list of types: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeValue.html
Based on the linked above, The "M" that you are seeing is describing the contents of "test.com1" attribute which is a map (M for map).
The reason you are not seeing these in your other attributes is probably because the SDK is automatically translating this DynamoDB structure into native types for the top-level attributes but not for nested attributes.
What language/SDK are you using? Many SDKs have helpers that you can pass your results through to parse these embedded types and convert them into native types that are easier to work with.

DynamoDB fast search on complex data types

I need to create a new table on AWS DynamoDB that will have a structure like the following:
{
"email" : String (key),
... : ...,
"someStuff" : SomeType,
... : ...,
"listOfIDs" : Array<String>
}
This table contains users' data and a list of strings that I'll often query (see listOfIDs).
Since I don't want to scan the table every time in order to get the user linked to that specific ID due to its slowness, and I cannot create an index since it's an Array and not a "simple" type, how could I improve the structure of my table? Should I use a different table where I have all my IDs and the users linked to them in a "flat" structure? Is there any other way?
Thank you all!
Perhaps another table that looks like:
ID string / hash key,
Email string / range key,
Any other attributes you may want to access
The unique combination of ID and email will allow you to search on the "List of IDs". You may want to include other attributes within this table to save you from needing to perform another query.
Should I use a different table where I have all my IDs and the users linked to them in a "flat" structure?
I think this is going to be your best bet if you want to leverage DynamoDB's parallelism for query performance.
Another option might be using a CONTAINS expression in a query if your listOfIDs is stored as a set, but I can't imagine that will scale performance-wise as your table grows.

Scan a dynamodb based on a list

I have a String Set attribute i.e SS in a dynamodb table. I need to scan the database to check the value present in the any one list of the items.
Which comparison operator should I use for this scan?
example the db has items like this:
name
[email1, email2]
phone
I need to search for a items containing a particular email say email1 alone not giving the entire tuple.
It seems like you are looking for the CONTAINS operator of Scan operation. It basically is the equivalent of in in Python.
This said, if you need to perform this often, you probably should de-normalize your data to make it faster.
For example, you could build a second table like this:
hash_key: name
range_key: email
Of course, you would have to maintain this index yourself and query it manually.