Is there any unique key constraint on timestamps in QuestDB? - questdb

I'm sending some data to QuestDB and I'm curious if two records incoming that have the same timestamp will cause an issue / conflicts. Not even duplicate values, but values from different devices with the same timestamp. Any there any unique key constraint on new records with this type?

As of writing, there is no unique key constraint on timestamps, for a table with a designated timestamp, the only requirement is that it must be equal to or greater than the most recent timestamp value on the table.

Related

Last writer wins uniqueness in DynamoDB secondary indexes

Can I overwrite an entry in a DynamoDB index be it global or local?
I do not want duplicate entries, but want DynamoDB to overwrite if an entry with same pk+sk exist in the index.
That's not how it works. Local and Global Secondary Indexes explicitly make no guarantees about uniqueness.
From the docs for Local Secondary Indexes:
In a DynamoDB table, the combined partition key value and sort key value for each item must be unique. However, in a local secondary index, the sort key value does not need to be unique for a given partition key value.
— docs
From the docs for Global Secondary Indexes:
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
— docs
Only the base table enforces uniqueness for the primary key - it's nothing that can be enforced by the system for LSIs or GSIs. If you need uniqueness there, you have to design your app and data model to ensure collisions can't happen.
Given that you don't (and can't) write to a GSI/LSI explicitly...
You'll need to ensure that you're updating the appropriate table record whose attributes used for LSI/GSI are changing..
In other words, for there to be only a single record in the GSI/LSI there can be only a single record in the with attribute values used in the GSI/LSI

How to sort DynamoDB table by a single column?

I'd like to list records from my DDB table ordered by creation date.
My table has an attribute DateCreated.
All examples I can find describe ordering within some partition.
But I want global ordering.
Am I supposed to create an artificial attribute which will have the same value across all records, just to use it as a partition key? E.g. add new attribute GlobalPartition with value 1 to every record in the table, and create a GSI with partition key GlobalPartition and sort key DateCreated. Isn't there a better way?
Thx!
As you noticed, DynamoDB indeed does not have an option to sort items "globally". In other words, there is no way to Scan the database in sorted partition-key order. You can only sort items inside one partition, sorted by the "sort key".
When you have a small amount of data, you can indeed do what you said: Have a single partition with everything in this partition. However it's not clear how practical this approach becomes as your single partition grows - to gigabytes or terabytes, and how well DynamoDB can load-balance when you have just a single partition (I never saw any DynamoDB documentation which answer this question).
So another option is not to have a single partition but rather have a number of them. For example, consider that you want to sort items by date. Now insead of having a single partition, have a partition per month, i.e., the partition key is the month number. Now, if you want to sort everything within a month, you can do it directly, but if you want to get a sorted list of a full year, you need to Query twelve partitions, in order, getting a sorted list in each one and combining it to a sorted list for the full year. So-called time-series databases are often modeled this way.
If you want to sort any data in DynamoDB you need to add Sort Key index on that attribute. If value is not in attribute which maps to tables' sort key, or table does not have sort key, then you need to create GSI and put GSI's sort key on that attribute. You can use LSI too. Any attribute, which maps to "Sort Key" of any index. Table, LSI, GSI.
Check for more details "ScanIndexForward" param of the query request.
If ScanIndexForward is true, DynamoDB returns the results in the order in which they are stored (by sort key value). This is the default behavior. If ScanIndexForward is false, DynamoDB reads the results in reverse order by sort key value, and then returns the results to the client.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#API_Query_RequestSyntax
UI has checkbox too for this:
"Global sort" is not possible, while "global" would mean scan operation and it just runs through all rows in database and filters by filters, yet it does not have sorting option. On query on attribute mapped to sort key has ScanIndexForward option to change sort direction.

Is it efficient to use multiple global indexes on a single DynamoDB table?

There exits a data-set as described in the below table. Sr.no is used in the below table only for reference
|sr.no| id | tis |data-type| b.id |idType_2| var_2 |
|-----|----------|-----|---------|----------|--------|--------|
| 1 |abc-def-gi|12345| a-type |1234567890| 843023 | NULL |
|-----|----------|-----|---------|----------|--------|--------|
| 2 |1234567890|12346| b-type | NULL | NULL |40030230|
|-----|----------|-----|---------|----------|--------|--------|
| 3 |abc-def-gj|12347| a-type |1234567890| 843023 | NULL |
Query types
Input id and if data-type is a-type return fields tis,b.id,id_type2 reference sr.no=1
Input id and if data-type is b-type return field var_2 reference sr.no=2
Input id_type2 return fields id,tis,b.id of sr.no=1,3
Input data-type return id based on tis between 12345 and 12347
Note
sr.no=1,3 or a-type of data is inserted 100k a times a day with unique id
sr.no=2 or b-type of data is a fixed set of
data.
Is the below key approach efficient for a dataset like this? Is there any other approach that can be followed to store and retrieve data from DynamoDB?
Partition Key = idto take care of Query 1,2.
GSI1=id_type2 and GSI1SK=id to take care of Query 3
GSI2=data-type and GSI2SK=tis to take care of Query 4
Here are my thoughts:
1) if you have data that has different access patterns you should consider splitting the data into different tables
2) if data is accessed together, store it together - what this means is that if whenever you read a-type data for some modeled entity, you also need to read one or more b-type records for the same entity, it is advantageous to place all these records in the same table, under the same partition key
To bring this all home, in your example, the ID for type a and type b data is different. This means that you get 0 benefit from storing both type a and type b in the same table. Use two different tables.
3) data that is not accessed together does not benefit at all from being placed in the same table and in fact has the potential to become an issue in more extreme circumstances
The main difference between relational vs non-relational databases is that in non-relational stores you don't have cross table joins, therefore whereas one of the tenets of relational databases is data normalization the opposite tends to be the case for non-relational.
This was solved by dong the following insde DynamoDB wihout creating any GSI.
When a GSI is created, whatever data is written in the main table is copied into the GSI table so WriteCost is x Number of GSIs. If you have 1 GSI this is PrimaryWrite+GSIWrite if you have 2 GSIs, then it's Primary + GSI1 + GSI2. Also the write into the GSI is the same as the primary so if you're wiriting into primary at 1000 WCU, the same will apply to the GSI so it will be a total of 2000 WCU for 1GSI and 3000WCU for 2 GSIs.
What we did
application_unique_id as hash key
timestamp as sort key
The rest of the keys were stored as attributes (DynamoDB supports dynamic JSON provided there is a valid hash key and a sort key).
We used a Lambda function attached to the table's DynamoDB Stream to write data into an ElasticSearch cluster.
We made a daily index of the latest snapshot data as DynamoDB holds all the trace points and is the best place to keep and query those.
This way we knew what data was sent on which day (as dynamodb doesn't let the user export a list of hash-keys). And we could do all the rest of the projected and comparison queries inside ElasticSearch.
DynamoDB solved the querying time series data at sub millisecond latency level
ElasticSearch solved the problem of all the comparison and filter operations on top of the data.
Set DynamoDB ttl to 30days, ElasticSearch doesn't support ttl however we drop the daily index once the index creation day crosses 30days.

What should be DynamoDB key schema for time sliding window data get scenario?

What I never understood about DynamoDB is how to design a table to effectively get all data with one particular field lying in some range. For example, time range - we would like to get data created from timestamp1 up to timestamp2. According to keys design, we can use only sort key for such a purpose. However, it automatically means that the primary key should be the same for all data. But according to documentation, it is an anti-pattern of DynamoDB usage. How to deal with the situation? Could be creating evenly distributed primary key and then a secondary key which primary part is the same for all items but sort part is different for all of them be a better solution?
You can use Global Secondary Index which in essence is
A global secondary index contains a selection of attributes from the base table, but they are organized by a primary key that is different from that of the table.
So you can query on other attributes that are unique.
I.e. as it might not be clear what I meant, is that you can choose something else as primary key that is possible to be unique and use a repetetive ID as GSI on which you are going to base your query.
NOTE: One of the widest applications of NoSQL DBs is to store timeseries, which you cannot expect to have a unique identifier as PK, unless you specify the timestamp.

Most efficient setup for sorted table with unique non-composite keys

Let's say I have a table where each item should have a unique non-composite key "id", some data and a "date" property. When reading this table I always want to return items ordered by "date". What is the best approach for this simple scenario?
1. Use ("id", "date") as the item primary key.
* Pro: don't need any secondary index.
* Con: "id" are not guaranteed to be unique
2. Use "id" alone as the primary key and create a secondary index on ("id", "date")
* Pro: "id" is guaranteed to be unique
* Con: An additional index is needed.
Also, if using approach 2 and never reading directly from the table, could I provision zero read capacity units for the table?
There are some confusing statements in your question.
You claim that "id" is unique but the combination("id","date") is not. How can that be?
Then, in the first part say that you want to "read the table" (do you mean return all the items in the table?!) in sorted order by date. Then at the end you ask whether you could provision 0 read units to the table if you won't be reading from it. The answer is you can't provision 0 but you can provision 1 which is negligible $-wise
If you really do need to be able to order all items in the table by date than you have to options:
You can create a GSI with a partition key that encompases all items and use "date" as the range key but in this case you really need to question the use of DynamoDB
You can read the whole table in memory and order there
You can set up a hierarchical structure to have some coarse date component as the partition key and the full "date" as the sort key and compose results by potentially executing multiple queries to retrieve data