DynamoDB sort order of Date RangeKey - amazon-web-services

I have a DynamoDB table with the following key values: A simple string id as HashKey and a string representing a Date as RangeKey. The date string is in YYYY-MM-DD format.
I am now wondering how DynamoDB orders its entries. When I query for multiple RangeKey values on the same HashKey the result is ordered by the date ascending.
However, according to the Dynamo documentation it will order all non-integer RangeKeys considering their UTF-8 byte values.
When I now save the following RangeKey entries:
2019-01-01
2018-12-04
2018-12-05
The output of a simple DynamoDBMapper.query(...) results in the correct order:
2018-12-04
2018-12-05
2019-01-01
Is Dynamo ordering the RangeKeys by date or is the byte value calculated a way that it matches with the date representation?

Its sorting it in UTF-8 bytes. It has no idea that you are sorting dates, to DynamoDB, its just a string.

Related

How to avoid error "Cannot insert rows out of order" in QuestDB?

I'm trying to migrate data to QuestDB and inserting historical records, I create table as
create table records(
type INT,
interval INT,
timestamp TIMESTAMP,
name STRING) timestamp(timestamp)
and insert data from CSV by curl uploading it.
I get back error "Cannot insert rows out of order". I read that out of order was supported in QuestDB but somehow I cannot make it work.
You can insert rows out of order on partitioned tables only, create new partitioned table and copy data into it
create table records2(
type INT,
interval INT,
timestamp TIMESTAMP,
name STRING
)
timestamp(timestamp) partition by DAY
insert into records2
select * from records
drop table records
rename table records2 to records
After this you'll be able to insert out of order into table records

DynamoDB date GSI

I have a DynamoDB table that stores executions of some programs, this is what it looks like:
Partition Key
Sort Key
StartDate
...
program-name
execution-id (uuid)
YYYY-MM-DD HH:mm:ss
...
I have two query scenarios for this table:
Query by program name and execution id (easy)
Query by start date range, for example: all executions from 2021-05-15 00:00:00 to 2021-07-15 23:59:59
What is the correct way to perform the second query?
I understand I need to create a GSI to do that, but how should this GSI look like?
I was thinking about splitting the StartDate attribute into two, like this:
Partition Key
Sort Key
StartMonthYear
StartDayTime
...
program-name
execution-id (uuid)
YYYY-MM
DD HH:mm:ss
...
So I can define a GSI using the StartMonthYear as the partition key and the StartDayTime as the sort key.
The only problem with this approach is that I would have to write some extra logic in my application to identify all the partitions I would need to query in the requested range. For example:
If the range is: 2021-05-15 00:00:00 to 2021-07-15 23:59:59
I would need to query 2021-05, 2021-06 and 2021-07 partitions with the respective day/time restrictions (only the first and last partition is this example).
Is this the correct way of doing this or am I totally wrong?
If you quickly want to fetch all executions in a certain time-frame no matter the program, there are a few ways to approach this.
The easiest solution would be a setup like this:
PK
SK
GSI1PK
GSI1SK
StartDate
PROG#<name>
EXEC#<uuid>
ALL_EXECUTIONS
S#<yyyy-mm-ddThh:mm:ss>#EXEC<uuid>
yyyy-mm-ddThh:mm:ss
PK is the partition key for the base table
SK is the sort key for the base table
GSI1PK is the partition key for the global secondary index GSI1
GSI1SK is the sort key for the global secondary index GSI1
Query by program name and execution id (easy)
Still easy, do a GetItem based on the program name for <name> and uuid for <uuid>.
Query by start date range, for example: all executions from 2021-05-15 00:00:00 to 2021-07-15 23:59:59
Do a Query on GSI1 with the KeyConditionExpression: PK = ALL_EXECUTIONS AND SK >= 'S#2021-05-15 00:00:00' AND SK <= 'S#2021-07-15 23:59:59'. This would return all the executions in the given time range.
But: You'll also build a hot partition, since you effectively write all your data in a single partition in GSI1.
To avoid that, we can partition the data a bit and the partitioning depends on the number of executions you're dealing with. You can choose years, months, days, hours, minutes or seconds.
Instead of GSI1PK just being ALL_EXECUTIONS, we can set it to a subset of the StartDate.
PK
SK
GSI1PK
GSI1SK
StartDate
PROG#<name>
EXEC#<uuid>
EXCTS#<yyyy-mm>
S#<yyyy-mm-ddThh:mm:ss>#EXEC<uuid>
yyyy-mm-ddThh:mm:ss
In this case you'd have a monthly partition, i.e.: all executions per month are grouped. Now you would have to make multiple queries to DynamoDB and later join the results.
For the query range from 2021-05-15 00:00:00 to 2021-07-15 23:59:59 you'd have to do these queries on GSI1:
#GSI1: GSI1PK=EXCTS#2021-05 AND GSI1SK >= S#2021-05-15 00:00:00
#GSI1: GSI1PK=EXCTS#2021-06
#GSI1: GSI1PK=EXCTS#2021-07 AND GSI1SK <= S#2021-07-15 23:59:59
You can even parallelize these and later join the results together.
Again: Your partitioning scheme depends on the number of executions you have in a day and also which maximum query ranges you want to support.
This is a long-winded way of saying that your approach is correct in principle, but you can choose to tune it based on your use case.

How to connect date table (date type column) to fact table date (datetime type column) type?

My date table dimension has date of date type.
Sales fact table has date of type datetime.
In model, how to connect date table (date type column) to fact table date (datetime column) type?
Basically your data model is broken. If you want to join to a Date Dimension then you need to add a column with a Date datatype to your fact table - whether you just add the date column or also drop the timestamp column is up to you and your specific requirements.
If you also need to join to a time dimension then you'll need to add a column to your fact table that has the same datatype as the PK on your Time Dim

DAX retrieve rows with max value within a group based on date filter

i have the following table with some sample data similar to what we have in our model:
there will be a date filter called "As Of Date" that will be selected by the user. I will need to create a new measure that filters the above table using these rules each time:
rows where reserve date <= the As Of Date selected by the user
rows that are in "Approved" status
for each combination of Claim Id, Damage Id, and Location Id, get the row with the latest sequence number
sum the reserve value column
so based on the sample data above, if the user select an As Of Date of 8-1-2020, the measure should sum the Reserve Value from the following rows that meet the criteria:
What I am trying to achieve is a measure that would return 900 as that is the sum of the reserve values for the rows that meet the criteria, as listed in the second table above.
Thanks
scott

Tabular Model - Sort by column: Property value is not valid multiple distinct values

I have the following date table:
When I use the MonthYear Past column and want to sort, the sort order is alphabetically instead of chronical. When I want to change to Sort Order in SSAS tabular model, I choose datekey or date but get the following error:
Property value is not valid
Cannot sort MonthYear Past by Date because at least one value in
MonthYear Past has multiple distinct values in Date. For example,
you can sort [City] by [Region] because there is only one region for each city,
but you cannot sort [Region] by [City] because there are multiple cities for each region.
How can I sort this column chronical?
To solve this:
Because Past is linked with multiple values, I created a new calculated column where when it is past the value is 1, else the value is YearMonth. & then sort order on your new calculated column