DynamoDB tables without RANGE value - amazon-web-services

Would there be any problem if I create my table without a RANGE value (ie, HASH value only)? I've heard that DynamoDB tables store entries in partitions, and these partitions are determined using the HASH key. If I use only HASH key on my table, would it cause too many partitions? Also, would it increase the time to seek data in my table using queries?

There's no drawback, the hash key decides the partition the data will live on and the partitions are designed to handle up to 3k RCU and 1k WCU or up to 10GB of data.
Performance "guarantees" are based on them delivering that, so it shouldn't matter.
Having unique partition keys may actually help you with scaling later down the line as chances are that requests can be spread out more evenly, since there is no query operation that works on multiple items in an item collection.

would it cause too many partitions?
There's no such thing in Dynamo DB.
would it increase the time to seek data in my table using queries
You can't Query() a DDB table unless it has a range key. With a only a hash key you can only use GetItem(). Scan() is also allowed, but you really shouldn't be using that regularly.
You'd have to add a Global Secondary Index (GSI) with a has & range key in order to Query() your data.

Related

Assigning the same partition key to all items in a DynamoDB table

I need to be able to run some range-based queries on my DynamoDB table, such as int_attribute > 5, or starts_with(string_attribute, "foo"). These can all be answered by creating a global or local secondary index and then submitting a Query to these indexes. However, running a Query requires that you also provide a single value of the partition key to restrict the query set. Neither of these queries has a strict equality condition, so I am therefore considering giving all the items in my Dynamo table the same partition key, and distinguishing them only with the sort key. My dataset is will within the 10 GB partition size limit.
Are there any catastrophic issues that might occur if I do this?
Yes, you can create a GSI where every item goes under the same partition key. The thing to be aware of is you'll generally be putting all those writes into the same physical partition, each of which has a max update rate of 1,000 WCU.
If your update rate is below that, proceed. If your update rate is above that, you'll want to follow a pattern of sharding the GSI partition key value so it spreads across more partitions.
Say you require 10,000 WCU for the GSI. You can assign each item's GSI PK value to a random value-{x} where x is 0 to 9. Then yes, at query time you do 10 queries and put the results back together yourself. This approach can scale as large as you need.

Indexed Range Query with DynamoDB

With DynamoDB, there is simply no straightforward way to perform an indexed range query over a column. Primary key, local secondary index, and global secondary index all require a partition key to range query.
For example, suppose I have a high-scores table with a numerical score attribute. There is no way to get the top 10 scores or top scores 25 to 50 with an indexed range query
So, what is the idiomatic or preferred way to perform this incredibly common task?
Settle for a table scan.
Use a static partition key and take advantage of partition queries.
Use a fixed number of static partition keys and use multiple partition queries.
It's either 2) or 3) but it depends on the amount and structure of data as well as the read/write activity.
There's no generic answer here as it's use-case specific.
As long as you can get away with it, you probably want to use 2) as it only requires a single Query API call. If you have lots of data or heavy read/write activity, you'd use some bucketing-strategy (very close to your third option) to write to multiple partitions, then do multiple queries and aggregate the results.
DDB isn't suited for analytics. As Maurice said you can facilitate what you need via secondary index, but there are also other options to consider:
If you are providing this Top N to your customers consistently/frequently and N is fixed, then you can have dedicated item(s) that hold this information and you would update that/those item(s) upon writing an item to a table. You can have 1 item for the whole top N or you can apply some bucketing strat.
If your system needs this information infrequently (on some singular occasions), then scan might be also fine.
If this is for analytics/research, consider exporting the table to S3 and using Athena.

Can DynamoDB reads on GSI with badly chosen partition key affect the read/write for the table

I have a DynamoDB table with a good partition key (PK=uuid) but there is a GSI (PK=type, SK=created) where type has only 6 unique values and created is epoch time.
My question is if I start to do a lot of reads with this GSI, will that affect the performance of the whole table? I see that the read capacity for both the table and the GSI is not shared according to this AWS docs but what will happen behind the scene if we start to use this GSI a lot? Will Dynamodb writes get impacted?
A global secondary index has separate capacity units, to avoid performance on the table itself.
A global secondary index is stored in its own partition space away from the base table and scales separately from the base table.
Performance on your table can only be impacted if its own credits are depleted. The global secondary index sits in its own partition space which can be treated as if has its own boundaries.
In addition as DynamoDB uses separate credits for read (RCU) and write (WCU) these 2 actions would never have a performance impact on the other.

Dynamo DB Query and Scan Behavior Question

I thought of this scenario in querying/scanning in DynamoDB table.
What if i want to get a single data in a table and i have 20k data in that table, and the data that im looking for is at 19k th row. Im using Scan with a limit 1000 for example. Does it consume throughput even though for the 19th time it does not returned any Item?. For Instance,
I have a User table:
type UserTable{
userId:ID!
username:String,
password:String
}
then my query
var params = {
TableName: "UserTable",
FilterExpression: "username = :username",
ExpressionAttributeValues: {
":username": username
},
Limit: 1000
};
How to effectively handle this?
According to the doc
A Scan operation always scans the entire table or secondary index. It
then filters out values to provide the result you want, essentially
adding the extra step of removing data from the result set.
Performance
If possible, you should avoid using a Scan operation on a large table
or index with a filter that removes many results. Also, as a table or
index grows, the Scan operation slows
Read units
The Scan operation examines every item for the requested values and can
use up the provisioned throughput for a large table or index in a
single operation. For faster response times, design your tables and
indexes so that your applications can use Query instead of Scan
For better performance an less read unit consumption i advice you create GSI using it with query
A Scan operation will look at entire table and visits all records to find out which of them matches your filter criteria. So it will consume throughput enough to retrieve all the visited records. Scan operation is also very slow, especially if the table size is large.
To your second question, you can create a Secondary Index on the table with UserName as Hash key. Then you can convert the scan operation to a Query. That way it will only consume throughput enough for fetching one record.
Read about Secondary Indices Here

Scan operation for getting a list of hash keys in DynamoDB table?

I want to know whether I have to use a dynamodb "Scan" operation for getting a list of all hash key values in a dynamodb table or is there an another "less-expensive" approach to do that. I have tried with a "Query" operation, but it was unsuccessful in my case, since I have to define the table hash key to use this operation. I just want to get a list of all hash key values in the table.
Yes, you need to use the scan method to access every item in the table. You can reduce the size of the data returned to you by setting the attributes_to_get attribute to only what you need(*) -- e.g. just the hash key value. Also, note that scan operations are eventually consistent, so if this database is actively growing, your result set may not include the most recent items added to the table.
(*) This will reduce the amount of bandwidth consumed and make the result less resource-intensive to process on the application side, but it will not reduce the amount of throughput that you are charged. Scan operation charges based on size of the entire item, not just attributes that get returned.
Unfortunately to get a list of hash key values you have to perform a Scan operation. What is your use case? Typically, the application should keep track of hash key values since there needs to be an evenly distributed workload. As a result, a Scan operation for this purpose should not happen frequently.
Edit: note that if you filter out the result using attributes_to_get or projection expression, it will help make the results cleaner but it will not reduce the amount of throughput that you are charged. Scan operation charges based on size of the entire item, not just attributes that get returned.