Optional secondary indexes in DynamoDB - amazon-web-services

I am migrating my persistence tier from Riak to DynamoDB. My data model contains an optional business identifier field, which is desired to be able to be queried as an alternative to the key.
It appears that DynamoDB secondary indexes can't be null and require a range key, so despite the similar name to Riak's secondary indexes, make this appear quite a different beast.
Is there an elegant way to efficiently query my optional field, short of throwing the data in an external search index?

When you asked this question, DynamoDB did not have Global Secondary Indexes: http://aws.amazon.com/about-aws/whats-new/2013/12/12/announcing-amazon-dynamodb-global-secondary-indexes/
Now, it does.
A local secondary index is best thought of, and functions as, a secondary range key. #andreimarinescu is right: you still must query by the item's hash key, only with a secondary index you can use a limited subset of a DynamoDB query's comparison operators on that range key (e.g. greater than, equal to, less than, etc.) So, you still need to know which "hash bucket" you're performing the comparison within.
Global secondary indexes are a bit of a different beast. They are more like a secondary version of your table (and Amazon charges you similarly in terms of provisioned throughput). You can use non-primary key attributes of your table as primary key attributes of your index in a global secondary index, and query them accordingly.
For example, if your table looks like:
|**Hash key**: Item ID | **Range Key**: Serial No | **Attribute**: Business ID |
--------------------------------------------------------------------------------
| 1 | 12345 | 1A |
--------------------------------------------------------------------------------
| 2 | 45678 | 2B |
--------------------------------------------------------------------------------
| 3 | 34567 | (empty) |
--------------------------------------------------------------------------------
| 3 | 12345 | 2B |
--------------------------------------------------------------------------------
Then, with a local secondary index on Business ID you could perform queries like, "find all the items with a hash key of 3 and a business ID equal to 2B", but you could not do "find all items with a business ID equal to 2B" because the secondary index requires a hash key.
If you were to add a global secondary index using business ID, then you could perform such queries. You would essentially be providing an alternate primary key for the table. You could perform a query like "find all items with a business ID equal to 2B and get items 2-45678 and 3-12345 as a response.
Sparse indexes work fine with DynamoDB; it's perfectly allowable that not all the items have a business ID and can allow you to keep the provisioned throughput on your index lower than that of the table depending on how many items you anticipate having a business ID.

The same is also possible using LSI.
Just make sure that you don't write any data to that Attribute.
In my scenario, for a LSI, I was writing empty string (""), which is not allowed. I skipped initialization of the sort key and it worked fine.
Basically DynamoDB won't even create the that attribute for that row.
Details of behavior is explained below
How can I make a sparse index if the key is always required?

Related

DynamoDB Unique list of elements across all records

I have a simple table that stores list of names for a particular record. I want to ensure that a name can never be used for any other record more than once. The names column should also not be empty; there should always be at least 1 given name.
| ID | Names |
|-----------------------|------------------------------------------------|
| 111 | [john, bob] |
| 222 | [tim] |
| 333 | [bob] (invalid bob already used) |
Easiest solution I believe for this case is to simply use a 2nd table for the values you are interested in being the primary key. Then in application code, simply check that new table if they exist or not to determine if you should create a new record in the primary table. For a List[L] this simply avoids having to traverse every single list of every record to determine if a particular scalar value already exists or not. Credit to Tamas.
https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/
Here’s a blog post describing how to best enforce uniqueness constraints: https://aws.amazon.com/blogs/database/simulating-amazon-dynamodb-unique-constraints-using-transactions/

DynamoDB GSI using boolean like a hash key

It's just a doubt that i cant find on the internet.
I have a table like this:
|  id | infos | ignored |
|  1  | abc  | true       |
|  2  | def   | false      |
|  3  | ghi   | false      |
I see i cant create a DynamoDB GSI on booleans columns. It's right?
I want to create a GSI on this ignored column.
The DynamoDB console only allows GSIs using types of string, binary, or number.
So you could use strings ("t" or "f"), numbers (1 or 0) or binary (also 1 or 0) to represent a boolean value if you'd like.
It sounds like you're trying to build a sparse index (e.g. only certain items are in the index). Keep in mind that you can do this by the mere existence of the attribute that makes up the GSI.
For example, you could include the ignored attribute on items you want to project into the index and remove the ignored attribute from items you do not want in the index.

Dynamodb - how to pick appropriate GSI and LSI

There are 5 columns in my Table "Banners".
id(string) | createdAt(Date) | caption(string) | isActive(binary) | order(Int)
For now, id is the partition key and primary key.
In the future, I might want to do something like getting all banners with isActive =1 and sorted by order.
As far as I understand, GSI is the another option for partition key, LSI is like the second sort key with unchanged partition key in the table.
Should isActive be GSI and order be LSI?
Here is my rule of thumb when it comes to the LSI: only use it when you
need strong read consistency in the secondary index
want to save provisioned RCU and WCU of the secondary index
Otherwise, use the GSI without any hesitation.

Amazon DynamoDB multiple scan conditions with multiple BeginsWith

I have table in Amazon DynamoDB with partition key and range key.
Table structure
Subscriber ID (partition key) | Item Id (Range Key) | Date |...
123 | P_345 | some date 1 | ...
123 | I_456 | some date 2 |
123 | A_678 | some date 3 | ...
Now I want to retrieve the data from the table using QueryAsync C# library with multiple scan conditions.
HashKey = 123
condition 1; Date is between 'some date 1' and 'some date 2'
condition 2. Range Key begins_with I_ and P_
Is there any way which I can achieve this using c# dynamoDB APIs?
Please help
You'll need to do the following (I'm not a C# expert, but you can use the following instructions to find the right C# syntax to do it):
Because you are looking for a specific hashkey, this will be a Query request, not a Scan.
You have a begins_with() condition on the range key. You specify that using the KeyConditionExpression parameter to the Query. The KeyConditionExpression will ask for HashKey=123 AND begins_with(RangeKey,"P_").
However, KeyConditionExpression does not allow an "OR" (rangekey begins with either "P_" or "I_"). You'll just need to run two separate queries - one with "I_" and one with "P_" (you can even do the two queries in parallel, if you wish).
The date is not one of the key columns, so you will need to filter it with a FilterExpression parameter to the query. Note that filtering only happens in the last step, after DynamoDB already read all the items matching the KeyConditionExpression above (this may increase your costs if filtering removes a lot of items and you will still pay for them).

What is the difference between scan and query in dynamodb? When use scan / query?

A query operation as specified in DynamoDB documentation:
A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
and the scan operation:
A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
Which is best based on performance and cost?
When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.
Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.
Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
In this example, you can use a Query operation to get:
A CustomerId with a conditional filter on AccountType
A Scan operation would need to be used to return:
All Customers with a specific AccountType
Items based on conditional filters by Country, ie All Customers from USA
Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month
To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase
In this example a Query operation can allow you to get:
A CustomerId with a conditional filter on AccountType
[GSI] A conditional filter on CustomerIds for a specific AccountType
[LSI] A CustomerId with a conditional filter on LastPurchase
You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.
If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.
eg:
here customer_country is the partition key/primary key
and id is the sort_key
-----------------------------------
customer_country | name | id
-----------------------------------
VV | Tom | 1
VV | Jack | 2
VV | Mary | 4
BB | Nancy | 5
BB | Lom | 6
BB | XX | 7
CC | YY | 8
CC | ZZ | 9
------------------------------------
If you perform query operation it applies only on customer_country value.
The value should only be equal operator (=).
So only items equal to that partition key/primary key value are fetched.
If you perform scan operation it fetches all items in that table and filter out data after it takes that data.
Note: Don't perform scan operation it exceeds your RCU.
Its similar as in the relational database.
Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.
while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.
In short, Try to use query if you know primary key. only scan for only the worst case.
Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective
In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document
Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query.
if you filter your query using:
key
key & key sort
index
index and it's related sort key
use Query! otherwise use scan which is more flexible about which columns you can filter.
you can NOT Query if:
more that 2 fields in the filter (e.g. key, sort and index)
sort key only (of primary key or index)
regular fields (not key, index or sort)
mixed index and sort (index1 with sort of index2)\
...
a good explaination:
https://medium.com/#amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f