In Google Spanner, is it possible that the exact same commit timestamp can appear again after already observed - google-cloud-platform

In Google Spanner, commit timestamps are generated by the server and based on "TrueTime" as discussed in https://cloud.google.com/spanner/docs/commit-timestamp. This page also states that timestamps are not guarnateed to be unique, so multiple independent writers can generate timestamps that are exactly the same.
On the documentation of consistency guarantees, it is stated that In addition if one transaction completes before another transaction starts to commit, the system guarantees that clients can never see a state that includes the effect of the second transaction but not the first.
What I'm trying to understand is the combination of
Multiple concurrent transactions committing "at the same time" resulting in the same commit timestamp (where the commit timestamp forms part of a key for the table)
A reader observing new rows being entered into above table
Under these circumstances, is it possible that a reader can observe some but not all of the rows that will (eventually) be stored with the exact same timestamp? Or put differently, if searching for all rows up to a known exact timestamp, and with rows are being inserted with that timestamp, is it possible that the query first returns some of the results, but when executed again returns more?
The context of this is an attempt to model a stream of events ordered by time in an append only manner - I need to be able to keep what is effectively a cursor to a particular point in time (point in the stream of events) and need to know whether or not having observed events at time T means you can never get more events again at exactly time T.

Spanner is externally consistent, meaning that any reader will only be able to read the results of completed transactions...
Along with all externally consistent DB's, it is not possible for a reader outside of a transaction to be able to read the 'pending state' of another transaction. So a reader at time T will only be able to see transactions that have been committed before time T.
Multiple simultaneous insert/update transactions at commit time T (which would affect different rows, otherwise they could not be simultaneous) would not be seen by the reader at time T, but both would be seen by a reader at T+1
I ... need to know whether or not having observed events at time T means you can never get more events again at exactly time T.
Yes - ish. Rephrasing slightly as this is nuanced:
Having read events up to and including time T means you will never get any more events occurring with time equal to or before time T
But remember that the commit timestamp column is a simple TIMESTAMP column where any value can be stored -- it is the application that requests that the value stored is the commit timestamp, and there is nothing at the DB level to stop the application storing any value it likes...
As always with Spanner, it is the application which has to enforce/maintain the data integrity.

Related

DynamoDB use case handling

I am using dynamoDB for a project. I have a use case where I maintain timeline for objects i.e. start and end time for an object and start time for next object. New objects can be added in between two existing objects(o1 & o2) in which I will have to update start time for next object in o1 and start time for next object in new object as start time of o2. This can cause problem in case two new objects are being added in between two objects and would probably require transactions. Can someone suggest how this can be handled?
Update: My data model looks like this:
objectId(Hash Key), startTime(Sort Key), endTime, nextStartTime
1, 1, 5, 4
1, 4, 6, 8
1, 8, 10, 9
So, it's possible a new entry comes in whose start time is 5. So, in transaction I will have to update nextStartTime for second entry to 5 and insert a new entry after the second entry which contains nextStartTime as start time of third entry. During this another entry might come in which also has start time between second and third entry(say 7 for eg.). Now I want the two transactions to be isolated of each other. In traditional SQL DBs it would be possible as second entry would be locked for the duration of transaction but Dynamo doesn't lock the items. So, I am wondering if I use transaction would the two transactions protect the data integrity.
DynamoDB supports optimistic locking. This is achieved via conditional writes.
You can do it manually by introducing a version attribute or you can use the one provided (hopefully) by your SDK. Here is a link to AWS docs.
TLDR
two objects have to update the same timeline at the same time
one will succeed the other will fail with a specific error
you will have to retry the failing one
Dynamo also has transactions. However, they are limited to 25 elements and consume 2x capacity units. If you can get away with an optimistic lock go for it.
Hope this was helpful
Update with more info on transactions
From this doc
Error Handling for Writing Write transactions don't succeed under the
following circumstances:
When a condition in one of the condition expressions is not met.
When a transaction validation error occurs because more than one
action in the same TransactWriteItems operation targets the same item.
When a TransactWriteItems request conflicts with an ongoing
TransactWriteItems operation on one or more items in the
TransactWriteItems request. In this case, the request fails with a
TransactionCanceledException.
When there is an insufficient provisioned capacity for the transaction to
be completed.
When an item size becomes too large (larger than 400 KB), or a local
secondary index (LSI) becomes too large, or a similar validation error
occurs because of changes made by the transaction.
When there is a user error, such as an invalid data format.
They claim that if there are two ongoing transactions on the same item, one will fail.
Why store the nextStartTime in the item? The nextStartTime is simply the start time of the next item, right? Seems like it'd be much easier to just pull the item as well as the next item to get the full picture at read-time. With a Query you can do this in one call, and so long as items are less than 2 KB in size it wouldn't even consume more RCUs than a get item would.
Simpler design, no cost for transactional writes, no need to do extensive testing on thread safety.

AWS Dynamodb scan ordering?

We have a setup where various worker nodes perform computations and update their relative states in a DynamoDB table. The table acts as a kind of history of activity of the worker nodes. A watchdog node needs to periodically scan through the table, and build an object representing the current state of the worker nodes and their jobs. As such, it's important for our application to be able to scan the table and retrieve data in chronological order (i.e. sorted by timestamp). The table will eventually be too large to scan into local memory for later ordering, so we cannot sort it after scanning.
Reading from the AWS documentation about the primary key:
DynamoDB uses the partition key value as input to an internal hash
function. The output from the hash function determines the partition
(physical storage internal to DynamoDB) in which the item will be
stored. All items with the same partition key are stored together, in
sorted order by sort key value.
Documentation on the scan function doesn't seem to mention anything about the order of the returned results. But can that last part in the quote above (the part I emphasized in bold) be interpreted to mean that the results of scans are ordered by the sort key? If I set all partition keys to be the same value, say "0", then use my timestamp as the sort key, can I be guaranteed that the scan operation will return data in chronological order?
Some note:
All code is written in Python, and thus I'm using the boto3 module to perform scan operations.
Our system architect is steadfast against the idea of updating any entries in the table to reflect their current state, or deleting items when the job is complete. We can only ever add to the table, and thus we need to scan through the whole thing each time to determine the worker states.
I am using strong read consistency for scan operations.
Technically SCAN never guarantees order (although as an observation the lack of order guarantee seems to mean that the partition is randomly ordered, but the sort remains, well, sorted.)
What you've proposed will work however, but instead of scanning, you'll be doing a query on partition-key == 0, which will then return all the items with the partition key of 0, (up to limit and optional sorted forward/backwards) sorted by the sort key.
That said, this is really not the way that dynamo wants you to use it. For example, it guarantees your partition will run hot (because you've explicitly put everything on the same partition), and this operation will cost you the capacity of reading every item on the table.
I would recommend investigating patterns such as using a dynamodb stream processed by a lambda to build and maintain a materialised view of this "current state", rather than "polling" the table with this expensive scan and resulting poor key design.
You’re better off using yyyy-mm-dd as the partition key, rather than all 0. There’s a limit of 10 GB of data per partition, which also means you can’t have more than 10 GB of data per partition key value.
If you want to be able to retrieve data sorted by date, take the ISO 8601 time stamp format (roughly yyyy-mm-ddThh-mm-ss.sss), split it somewhere reasonable for your data, and use the first part as the partition key and the second part as the sort key. (Another advantage of this approach is that you can use eventually consistent reads for most of the queries since it’s pretty safe to assume that after a day (or an hour o something) that the data is completely replicated.)
If you can manage it, it would be even better to use Worker ID or Job ID as a partition key, and then you could use the full time stamp as the sort key.
As #thomasmichaelwallace mentioned, it would be best to use DynamoDB streams with Lambda to create a materialized view.
Now, that being said, if you’re dealing with jobs being run on workers, then you should also consider whether you can achieve your goal by using a workflow service rather than a database. Workflows will maintain a job history and/or current state for you. AWS offers Step Functions and Simple Workflow.

Datomic Clojure API: Semantics of t->tx?

I do not understand the meaning of the datomic.api/t->tx, which accepts a t value and returns a transaction id.
Is not the t value a property of the database incremented at each transaction ? If so, how could a transaction number be derived from it independently of any database?
Transaction Data (tx) is an Entity with an id. t is not the id, but there is an efficient mechanism to discover the t of a tx and for a tx the t. Effectively this means that you can look up transactions in a time range efficiently.
t is indeed a value indicating the order in which transactions occur, though I would not call it a property of the database or speculate on how it is calculated as it is not defined and could change. There is no defined term "transaction number", but I think you are asking how could one create a strictly increasing number without a database property. One answer is to use a clock, and prevent more than one transaction per tick. In any case, it isn't an implementation detail that we need to know in order to use the system.
The glossary provides these definitions which I think are helpful:
t
A point in time in a database. Every transaction is assigned a numeric t value greater than any previous t in the database, and all processes see a consistent succession of ts. A transaction value t can be converted to a tx id with Peer.toTx.
tx
An entity representing a transaction. Every datom in a Datomic database includes the tx that created it, allowing recovery of the entire history of the database. Transactions are automatically associated with wall-clock time, but are otherwise ordinary entities. In particular, application code can make additional assertions about transactions. A tx entity id can be converted to a transaction t with Peer.toT.

Database polling, prevent duplicate fetches

I have a system whereby a central MSSQL database keeps in a table a queue of jobs that need to be done.
For the reasons that processing requirements would not be that high, and that there would not be a particularly high frequency of requests (probably once every few seconds at most) we made the decision to have the applications that utilise the queue simply query the database whenever one is needed; there is no message queue service at this time.
A single fetch is performed by having the client application run a stored procedure, which performs the query(ies) involved and returns a job ID. The client application then fetches the job information by querying by ID and sets the job as handled.
Performance is fine; the only snag we have felt is that, because the client application has to query for the details and perform a check before the job is marked as handled, on very rare occasions (once every few thousand jobs), two clients pick up the same job.
As a way of solving this problem, I was suggesting having the initial stored procedure that runs "tag" the record it pulls with the time and date. The stored procedure, when querying for records, will only pull records where this "tag" is a certain amount of time, say 5 seconds, in the past. That way, if the stored procedure runs twice within 5 seconds, the second instance will not pick up the same job.
Can anyone foresee any problems with fixing the problem this way or offer an alternative solution?
Use a UNIQUEIDENTIFIER field as your marker. When the stored procedure runs, lock the row you're reading and update the field with a NEWID(). You can mark your polling statement using something like WITH(READPAST) if you're worried about deadlocking issues.
The reason to use a GUID here is to have a unique identifier that will serve to mark a batch. Your NEWID() call is guaranteed to give you a unique value, which will be used to prevent you from accidentally picking up the same data twice. GETDATE() wouldn't work here because you could end up having two calls that resolve to the same time; BIT wouldn't work because it wouldn't uniquely mark off batches for picking up or reporting.
For example,
declare #ReadID uniqueidentifier
declare #BatchSize int = 20; -- make a parameter to your procedure
set #ReadID = NEWID();
UPDATE tbl WITH (ROWLOCK)
SET HasBeenRead = #ReadID -- your UNIQUEIDENTIFIER field
FROM (
SELECT TOP (#BatchSize) Id
FROM tbl WITH(UPDLOCK ROWLOCK READPAST )
WHERE HasBeenRead IS null ORDER BY [Id])
AS t1
WHERE ( tbl.Id = t1.Id)
SELECT Id, OtherCol, OtherCol2
FROM tbl WITH(UPDLOCK ROWLOCK READPAST )
WHERE HasBeenRead = #ReadID
And then you can use a polling statement like
SELECT COUNT(*) FROM tbl WITH(READPAST) WHERE HasBeenRead IS NULL
Adapted from here: https://msdn.microsoft.com/en-us/library/cc507804%28v=bts.10%29.aspx

Auto-increment on Azure Table Storage

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).