how to query data based on datetime from dynamodb - amazon-web-services

Table: Customer
Hashkey: email
Other Attributes: name, address, purchasedamount, datecreated
Sample Data:
"xxx1.xxx.com", "XXXXX1", "no1.street",2500,"10-01-2017 01:02:03"
"xxx2.xxx.com", "XXXXX2", "no2.street",2000,"11-01-2017 04:05:06"
"xxx3.xxx.com", "XXXXX3", "no3.street",4050,"10-02-2017 07:08:09"
"xxx4.xxx.com", "XXXXX4", "no4.street",2800,"11-02-2017 10:11:12"
How to fetch customers, whose purchased date from "11-01-2017 00:00:00" to "10-02-2017 00:00:00".

Looking at your sample data, I don't see an easy way to do it unfortunately. I would say you need to do it in code (Scan all items and filter at the application level).
If changing the data model is an option:
Easiest and recommended approach with date / times in DynamoDB is to store is in ISO8601 format, using String data type.
ISO8601: Date and time values are ordered from the largest to smallest unit of time: year, month (or week), day, hour, minute, second, and fraction of second. The lexicographical order of the representation thus corresponds to chronological order, except for date representations involving negative years. This allows dates to be naturally sorted by, for example, DynamoDB.
If you use your Date attribute as a Sort Key / LSI, it enables you to ask DynamoDB to do the heavy lifting for querying between two dates (within a Partition Key), by using the BETWEEN comparison operator.

Related

DynamoDB Query by Prefix of Partition Key

I have a dynamodb table with following GSI:
partition key: scheduled_date which is a date string yyyy-mm-dd HH:MM:SS
range key: task_id which is an uuid
I would like to query for all items whose scheduled_date falls in a date, i.e. its prefix matches a string yyyy-mm-dd.
Is it possible without performing scan?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.KeyConditions.html
You must provide the index partition key name and value as an EQ
condition.
In your case, you could consider using yyyy-mm-dd (or yyyymmdd) as the partition key to get all of the items that have that scheduled date.
You could keep task_id as the Range key OR you could use a prefix like HH:MM:SS:task_id. That way the tasks for a particular day would come back sorted by time. And if you really needed to you, could query them by time range.
There is also the alternative of using Global Secondary Indexes that can be utilized in a similar manner.

How to query DynamoDB by string between + other keys

I'm trying to design a DynamoDB query that meets the following criteria:
get items by type, category, and date between(date_1, date_2)
I have these attributes already stored in a Global Secondary Index:
type (string)
category (string)
date (string)
I know I could use the between operator to query by a given date string:
gsi_1_pk = 'products' and gsi_1_sk between '2019-01-01T00:00:00.000Z' and '2019-01-01T00:00:00.000Z'
But there are situations where I want to query by the 3 attributes, not only the date.
So, I want a solution that allows me to query by all the possible filtering combinations: type, category, date between, type + category, type + date between, category + date between type + category + date between.
How can I combine this between operation with the other attributes from the GSI?
I ended up creating a new Global Secondary Index, where I store the date alone at the Sorting Key, which allows me to use the between Dynamo operation with no problem.
The downside is that I had to create a new GSI for such a simple query. But as many said here, DynamoDB seems not to be the "right/best" tool for this job.

Get latest 3 entries from DynamoDb

I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.

Is it possible to sort a Cassandra Column Family by a specific column of a list of a user-defined datatype?

I'm having a little hard time understanding Cassandra. I simply couldn't write this question without making it look like confusing, but as I detail it below it may become clearer.
Suppose I have this datatype that I've created:
CREATE TYPE transaction (
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY (transaction_id, transaction_date)
);
PS: I'm using it as if it was a 'class', but that might be a logical mistake of mine, please correct me if it can't be used as such.
Anyway, also I have this Column Family, in which I've created a list of this 'transaction' datatype:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transactions list <transaction>,
PRIMARY KEY (wallet_address, transaction_date))
WITH CLUSTERING ORDER BY (transaction_date DESC);
So what I'd like to know if this Column Family above is correct. I'd like to get all the transactions of a wallet, sorted by the transaction date (but the date is a column of the 'transaction' datatype - and to complicate it even more, in this Column Family there's a list of transactions, and not just a single one).
No, in Cassandra you can sort only on the value of the clustering column - in this case you need to move transaction_date into table itself...
To expand on Alex's answer, in your situation I think the best approach would probably be to denormalise your table. Rather than using a UDT, you could create something like this:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY ((wallet_address), transaction_date, transaction_id))
WITH CLUSTERING ORDER BY (transaction_date DESC);
Now you can make the following query and the results will be sorted by date:
SELECT * FROM transactions_history_by_date WHERE wallet_address = ...;
Note that I added transaction_id as a second clustering key. If this was omitted the table would not have been able to hold two transactions that had the same wallet_address and the same transaction_date. This is because unique rows are identified by the primary key.

Power BI Dashboard where the core filter condition is a disjunction on numeric fields

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!