Dynamodb query operations - amazon-web-services

Is it possible to work with "OR" , "AND " Query operations in DynamoDB?
I need to know if DynamoDB has something like "where fname = xxxx OR lname = xxxx" from SQL queries? in Rails
Thanks.

In general, no.
DynamoDB only allows efficient lookup by primary(hash) key, plus optionally a range query on the "range key". Other attributes are not indexed.
You can use a Scan request to read an entire table filter by a set of attributes, but this is a relatively expensive and slow option for large tables.
You can simulate AND by creating a primary key that includes both values to be queried, and OR by creating duplicate tables that each use one attribute as their primary key, and querying both tables in parallel with BatchGetItem

As BCoates mentioned the answer is NO.
If you want consistent read then you can't use BatchGetItem.

No it not possible to Use 'OR' Operator,
For example in
KeyConditionExpression: '#hashkey = :hk_val AND #rangekey > :rk_val',
it Uses And Operator for matching the for bot HASH and RANGE Key.
There fore we cant use OR in Dynamo Db.

Although there is no such thing as "OR", "AND", but you can still simulate sql queries using scan and filterexpression.
But remember that if you are using scan, then it means whole table will be fetched and then processed, also scan doesn't always scan whole table in one iteration. so you might miss out some items. so this method is generally avoided as it is very costly. but here is piece of python3 code using boto3 api that can simulate the sql query you want
response = table.scan(
FilterExpression=Attr('fname').eq('xxxxx') | Attr('lname').eq('xxxxx'))
filterexpression also different operatiors like &,~

Related

Compare values in dynamodb during query without knowing values in ExpressionAttributeValues

Is it possible to apply a filter based on values inside a dynamodb database?
Let's say the database contains an object info within a table:
info: {
toDo: x,
done: y,
}
Using the ExpressionAttributeValues, is it possible to check whether the info.toDo = info.done and apply a filter on it without knowing the current values of info.toDo and info.done ?
At the moment I tried using ExpressionAttributeNames so it contains:
'#toDo': info.toDo, '#done': info.done'
and the filter FilterExpression is
#toDo = #done
but I'm retrieving no items doing a query with this filter.
Thanks a lot!
DynamoDB is not designed to perform arbitrary queries as you might be used to in a relational database. It is designed for fast lookups based on keys.
Therefore, if you can add an index allowing you to access the records you look for, you can use it for this new access pattern. For example, if you add an index that uses info.toDo as the partition key and info.done as the sort key. You can then use the index to scan the records with the conditional expression of PK=x and SK=x, assuming that the list of possible values is limited and known.

DynamoDB query with both GT and begins_with for sort key?

I have a single table design where I have chat rooms (PK) with timestamped messages (SK). Since it's a single table design the SK has a MSG# prefix, followed by the message creation timestamp, to keep message entities separate from other entities.
I'd like to retrieve all messages after a certain timestamp. It seems like the key condition should be PK = "<ChatRoomId>" AND begins_with(SK, "MSG#") AND SK GT "MSG#<LastRead>". The first part of the SK condition is to only fetch message entities and the second is to only fetch new messages. Is it possible to have a double conditions on the sort key like this? It seems like it should be possible as it denotes a contiguous range of sort keys.
You can easily achieve that by using between:
PK = "<ChatRoomId>" AND SK BETWEEN "MSG#<YourDate>" AND "MSG#9999-99-99"
This way you will get all messages starting at <YourDate> and no records with other prefixes. This will work unless you're planning very far ahead.
I have exactly the same use case and found out this answer, thanks for this suggestion, it works but we decided to research further - "between" is inclusive and we'd have to either waste one read capacity unit or make up a fake value as a workaround.
Turns out, the DynamoDB API provides this feature, it's the exclusive start key: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-ExclusiveStartKey
Admittedly, the documentation is not very encouraging and seems to suggest that the parameter is some opaque data that you can only obtain by having a previous query:
The primary key of the first item that this operation will evaluate. Use the value that was returned for LastEvaluatedKey in the previous operation.
But the actual content of that key is very simple and transparent: it's a map like {"PK": {"S": "your_pk"}, "SK": {"S": "exclusive_start_sk"}} ( replace PK/SK with your actual key - if you're doing single table design you're probably using those generic names ). GSIPK/GSISK may be provided instead, if you're querying a GSI instead of the main table. You can do some manual query and observe the returned LastEvaluatedKey to verify what it's expecting.
From there you can combine greater_than and begins_with, greater_than being expressed as a pagination parameter

How to limit dynamodb scan to a given partition key and NOT read the entire table

Theoretical table with billions of entries.
Partition key is a unique uuid representing a given deviceId. There will be around 10k unique uuids.
Sort Key is a dateString for when the data was collected.
Each item has some data fields. There are dozens of fields such that making a GSI for each wouldn't be reasonable. For our example, let's say we are looking for the "dataOfInterest" field.
I'd like to search the DB for "all items where the dataOfInterest = 'foobar'" - and ideally do it within a date range. As far as I know, a scan operation is the only option. With billions entries... that's not going to be a fast process (though I understand I could split it out to run multiple operations at a time - it's stil going to eat RCU's like crazy)
Of note, I only care about a given uuid for each search, however. In other words, what I REALLY care about is "all items within a given partition where the dataOfInterest = 'foobar'". And futher, it'd be great to use the sort key to give "all items within a given partition where the dataOfInterest = 'foobar' that are between Jan 1 and Feb 28"
The scan operation allows you to limit the results with a filter expression such that I could get the results of just a single partition ... but it still reads the entire table and the filtering is done before returning the data to you. https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
Is there an AWS API that does a scan-like operation that reads only a given partition? Are there other ways to achieve this (perhaps re-architecting the DB?)
As #jarmod says, you can use a Query and specify the PK of the UUID. You can then either put the timestamp into the SK and filter for the dataOfInterest value (unindexed), or for more efficiency and to make everything indexed you can construct a composite SK which is dataOfInterest#timestamp and then do a range query on the SK of foobar#time1 to foobar#time2. That makes this query perfectly index optimized.
Course, this makes purely timestamp-based queries less simple. So you either do multiple queries for those or, if you want both queries efficient, setup this composite SK in a GSI and use that to resolve this query.

How to run a 'greater than' query in Amazon DynamoDB?

I have a primary key in the table as 'OrderID', and it's numerical which increments for every new item. An example table would look like -
Let's assume that I want to get all orders above the OrderID '1002'. How would I do that?
Is there any possibility of doing this with DynamoDB Query?
Any help is appreciated :)
Thanks!
Unfortunately with this base table you cannot perform a query with a greater than for the partition key.
You have 3 choices:
Migrate to using scan, this will use up your read credits significantly.
Creating a secondary index, you'd want a global secondary index with the sort key becoming your order id. Take a look here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.OnlineOps.html#GSI.OnlineOps.Creating.
Loop over in the application performing a Query or GetItem request from intial value until there are no results left (very inefficient).
The best practice would be to use the GSI if you can as this will be the most performant.

DynamoDB querying in 2017

There are few similar questions out there, but looks like they are outdated.
Does DynamoDB still have problems with querying or not?
Use case: table contains users with parameters: name, phone, email, groupId, created, etc...
I want to get all users with groupId = 1, name iLike 'jo' and created > a_year_ago_timestamp.
Looks like this is possible already, according to this.
Or this is another highly expensive scanning operation?
As long as you are using the Query API of DynamoDB, it is not an expensive scanning operation. Using Query API implies that you know the hash key of the table.
In the above case, I assume groupId is a hash key of the table. Please note that you can't use CONTAINS or GE (i.e. greater than) for hash key attribute on KeyConditionExpression.
So, groupId must be hash key in order to use Query API. Otherwise, you may need to look at GSI (Global Secondary Index) in order to use Query API.
Obviously, if you use Scan API with FilterExpression, it would be a costly operation.