How to get history of a state in corda? - blockchain

In My corda project state may evolve over time. I have made the state of type LinearState. Now I want to retrieve the history of a corda state that means, How it evolved over time. How can I see the evolution history of a particular state in Corda?
Particularly, I want to access the complete transaction chain of a state.

Of course without access to your code this answer will vary, but there's two pieces of documentation to be aware of here.
What you want to perform is essentially a vault query (depending on what information you're looking to get).
From the docs on LinearState:
Whenever a node records a new transaction, it also decides whether it should store each of the transaction’s output states in its vault. The default vault implementation makes the decision based on the following rules.
source: https://docs.corda.net/docs/corda-os/4.6/api-states.html#the-vault
That being said, to perform your vault query you would do it just like you would other states. Here's the docs on the vault query API : https://docs.corda.net/docs/corda-os/4.6/api-vault-query.html
If you have the linear Id you can do it from the node shell or using H2 and looking in places like the VAULT_LINEAR_STATES table.
If you want an example of querying in code take a look at the obligation cordapp that takes the linearID as a parameter to the flow.
// 1. Retrieve the IOU State from the vault using LinearStateQueryCriteria
List<UUID> listOfLinearIds = Arrays.asList(stateLinearId.getId());
QueryCriteria queryCriteria = new QueryCriteria.LinearStateQueryCriteria(null, listOfLinearIds);
Vault.Page results = getServiceHub().getVaultService().queryBy(IOUState.class, queryCriteria);
StateAndRef inputStateAndRefToSettle = (StateAndRef) results.getStates().get(0);
IOUState inputStateToSettle = (IOUState) ((StateAndRef) results.getStates().get(0)).getState().getData();
Source Code example here: https://github.com/corda/samples-java/blob/master/Advanced/obligation-cordapp/workflows/src/main/java/net/corda/samples/flows/IOUSettleFlow.java#L56-L61

Related

Firestore in Datastore mode does not seem to be strongly consistent

I am using cloud endpoints with objectify and Firestore in Datastore mode. Although it says in the documentation that all queries are strongly consistent, I have found that they are not in the following examples:
Example 1
I made an endpoint that queries for an entity by a property, adds +1 to a count property on it, and saves it back to the datastore. I then have 50 different clients all execute that method at the same time. I would expect the count property to be 50, however, it usually ends up being somewhere between 25-30.
Example 2
I have an endpoint that queries for an entity by a property. If the entity does not exist, I create the entity and save it to the datastore. If it exists, I just return it. Again, I hit this endpoint with 50 different clients at the same time. I would expect there to only be one entity in the Datastore. However, I will have maybe 5-10 of the same entity.
It seems to me this is not strongly consistent. If I take my code in the above endpoints and put them in a transaction with retries, all works as intended. I looked around in objectify to see if there is a ReadOptions set somewhere, but from what I can see, there is not, so it should be using the default of read_consistency=STRONG
For example 1, you need to use transactions to ensure that writes do not stomp on each other.
For example 2, again you need to use a transaction to get consistency across clients.
Strong consistency means that if a client writes a value, it can read or query it back after the write succeeds. Not that if a client reads a value, another reads the same value, they each do a transformation, and try to write that the blinds writes for each client will merge together.

What's the cheapest way to store an auto increment indexed list of values in AWS?

I have a DynamoDB-based web application that uses DynamoDB to store my large JSON objects and perform simple CRUD operations on them via a web API. I would like to add a new table that acts like a categorization of these values. The user should be able to select from a selection box which category the object belongs to. If a desirable category does not exist, the user should be able to create a new category specifying a name which will be available to other objects in the future.
It is critical to the application that every one of these categories be given a integer ID that increments starting the first at 1. These numbers that are auto generated will turn into reproducible serial numbers for back end reports that will not use the user-visible text name.
So I would like to have a simple API available from the web fronted that allows me to:
A) GET /category : produces { int : string, ... } of all categories mapped to an ID
B) PUSH /category : accepts string and stores the string to the next integer
Here are some ideas for how to handle this kind of project.
Store it in DynamoDB with integer indexes. This leaves has some benefits but it leaves a lot to be desired. Firstly, there's no auto incrementing ID in DynamoDB, but I could definitely get the state of the table, create a new ID, and store the result. This might have issues with consistency and race conditions but there's probably a way to achieve this safely. It might, however, be a big anti pattern to use DynamoDB this way.
Store it in DynamoDB as one object in a table with some random index. Just store the mapping as a JSON object. This really forgets the notion of tables in DynamoDB and uses it as a simple file. It might also run into some issues with race conditions.
Use AWS ElasticCache to have a Redis key value store. This might be "the right" decision but the downside is that ElasticCache is an always on DB offering where you pay per hour. For a low-traffic web site like mine I'd be paying minumum $12/mo I think and I would really like for this to be pay per access/update due to the low volume. I'm not sure there's an auto increment feature for Redis built in the way I'd need it. But it's pretty trivial to make a trasaction that gets the length of the table, adds one, and stores a new value. Race conditions are easily avoid with this solution.
Use a SQL database like AWS Aurora or MYSQL. Well this has the same upsides as Redis, but it's also more overkill than Redis is, and also it costs a lot more and it's still always on.
Run my own in memory web service or MongoDB etc... still you're paying for constant containers running. Writing my own thing is obviously silly but I'm sure there are services that match this issue perfectly but they'd all require a constant container to run.
Is there a food way to just store a simple list, or integer mapping like this that doesn't cost a constant monthly cost? Is there a better way to do this with DynamoDB?
Store the maxCounterValue as an item in DyanamoDB.
For the PUSH /category, perform the following:
Get the current maxCounterValue.
TransactWrite:
Put the category name and id into a new item with id = maxCounterValue + 1.
Update the maxCounterValue +1, add a ConditionExpression to check that maxCounterValue = :valueFromGetOperation.
If TransactWrite fails, start at 1 again, try X more times

Are DynamoDB conditional writes transactional?

I'm having trouble wrapping my head around the dichotomy of DDB providing Condition Writes but also being eventually consistent. These two truths seem to be at odds with each other.
In the classic scenario, user Bob updates key A and sets the value to "FOO". User Alice reads from a node that hasn't received the update yet, and so it gets the original value "BAR" for the same key.
If Bob and Alice write to different nodes on the cluster without condition checks, it's possible to have a conflict where Alice and Bob wrote to the same key concurrently and DDB does not know which update should be the latest. This conflict has to be resolved by the client on next read.
But what about when condition write are used?
User Bob sends their update for A as "FOO" if the existing value for A is "BAR".
User Alice sends their update for A as "BAZ" if the existing value for A is "BAR".
Locally each node can check to see if their node has the original "BAR" value and go through with the update. But the only way to know the true state of A across the cluster is to make a strongly consistent read across the cluster first. This strongly consistent read must be blocking for either Alice or Bob, or they could both make a strongly consistent read at the same time.
So here is where I'm getting confused about the nature of DDBs condition writes. It seems to me that either:
Condition writes are only evaluated locally. Merge conflicts can still occur.
Condition writes are evaluated cross cluster.
If it is #2, the only way I see that working is if:
A lock is created for the key.
A strongly consistent read is made.
Let's say it's #2. Now where does this leave Bob's update? The update was made to node 2 and sent to node 1 and we have a majority quorum. But to make those updates available to Alice when they do their own conditional write, those updates need to be flushed from WAL. So in a conditional write are the updates always flushed? Are writes always flushed in general?
There have been other questions like this here on SO but the answers were a repeat of, or a link to, the AWS documentation about this. The AWS documentation doesn't really explain this (or i missed it).
DynamoDB conditional writes are "transactional" writes but how they're done is not public information & is perhaps proprietary intellectual property.
DynamoDB developers are the only ones with this information.
Your issue is that you're looking at this from a node perspective - I have gone through every mention of node anywhere in DynamoDB documentation & it's just mentions of Node.js or DAX nodes not database nodes.
While there can be outdated reads - yes, that would indicate some form of node - there are no database nodes per such when doing conditional writes.
User Bob sends their update for A as "FOO" if the existing value for A is "BAR". User Alice sends their update for A as "BAZ" if the existing value for A is "BAR".
Whoever's request gets there first is the one that goes through first.
The next request will just fail, meaning you now need to make a new read request to obtain the latest value to then proceed with the 2nd later write.
The Amazon DynamoDB developer guide shows this very clearly.
Note that there are no nodes, replicas etc. - there is only 1 reference to the DynamoDB table:
Condition writes are probably evaluated cross-cluster & a strongly consistent read is probably made but Amazon has not made this information public.
Ermiya Eskandary is correct that the exact details of DynamoDB's implementation aren't public knowledge, and also subject to change in the future while preserving the documented guarantees of the API. Nevertheless, various documents and especially video presentations that the Amazon developers did in the past, made it relatively clear how this works under the hood - at least in broad strokes, and I'll try to explain my understanding here:
Note that this might not be 100% accurate, and I don't have any inside knowledge about DynamoDB.
DynamoDB keeps three copies of each item.
One of the nodes holding a copy of a specific item is designated the leader for this item (there isn't a single "leader" - it can be a different leader per item). As far as I know, we have no details on which protocol is used to choose this leader (of course, if nodes go down, the leader choice changes).
A write to an item is started on the leader, who serializes writes to the same item. Note how DynamoDB conditional updates can only read and update the same item, so the same node (the leader) can read and write the item with just a local lock. After the leader evaluates the codintion and decides to write, it also sends an update to the two other nodes - returning success to the user only after two of the three nodes successfully wrote the data (to ensure durablity).
As you probably know, DyanamoDB reads have two options consistent and eventually-consistent: An eventually-consistent read reads from one of the three replicas at random, and might not yet see the result of a successful write (if the write wrote two copies, but not yet the third one). A consistent read reads from the leader, so is guaranteed to read the previously-written data.
Finally you asked about DynamoDB's newer and more expensive "Transaction" support. This is a completely different feature. DynamoDB "Transactions" are all about reads and writes to multiple items in the same request. As I explained above, the older conditional-updates feature only allows a read-modify-write operation to involve a single item at a time, and therefore has a simpler implementation - where a single node (the leader) can serialize concurrent writes and make the decisions without a complex distributed algorithm (however, a complex distributed algorithm is needed to pick the leader).

Create fault tolerance example with Dynamodb streams

I have been looking at DynamoDB to create something close to a transaction. I was watching this video presentation: https://www.youtube.com/watch?v=KmHGrONoif4 in which the speaker shows around the 30 minute mark ways to make dynamodb operation close to ACID compliant as can be. He shows the best concept is to use dynamodb streams, but doesn't show a demo or an example. I have a very simple scenario I am look at and that is I have one Table called USERS. Each user has a list of friends. If two users no longer wish to be friends they must be removed from both of the user's entities (I can't afford for one friend to be deleted from one entity, and due to a crash for example, the second user entities friend attribute is not updated causing inconsistent data). I was wondering if someone could provide some simple walk-through oh of how to accomplish something like this to see how it all works? If code could be provided that would be great to see how it works.
Cheers!
Here is the transaction library that he is referring: https://github.com/awslabs/dynamodb-transactions
You can read through the design: https://github.com/awslabs/dynamodb-transactions/blob/master/DESIGN.md
Here is the Kinesis client library:
http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
When you're writing to DynamoDB, you can get an output stream with all the operations that happen on the table. That stream can be consumed and processed by the Kinesis Client Library.
In your case, have your client remove it from the first user, then from the second user. In the Kinesis Client Library when you are consuming the stream and see a user removed, look at who he was friends with and go check/remove if needed - if needed the removal should probably done through the same means. It's not truly a transaction, and relies on the fact that KCL guarantees that the records from the streams will be processed.
To add to this confusion, KCL uses Dynamo to store where in the stream is at when processing and to checkpoint processed records.
You should try and minimize the need for transactions, which is a nice concept in a small scale, but can't really scale once you become very successful and need to support millions and billions of records.
If you are thinking in a NoSQL mind set, you can consider using a slightly different data model. One simple example is to use Global Secondary Index on a single table on the "friend-with" attribute. When you add a single record with a pair of friends, both the record and the index will be updated in a single action. Both table and index will be updated in a single action, when you delete the friendship record.
If you choose to use the Updates Stream mechanism or the Global Secondary Index one, you should take into consideration the "eventual consistency" case of the distributed system. The consistency can be achieved within milli-seconds, but it can also take longer. You should analyze the business implications and the technical measures you can take to solve it. For example, you can verify the existence of both records (main table as well as the index, if you found it in the index), before you present it to the user.

UUIDs for DynamoDB?

Is it possible to get DynamoDB to automatically generate unique IDs when adding new items to a table?
I noticed the Java API mentions #DynamoDBAutoGeneratedKey so I'm assuming there's a way to get this working with PHP as well.
If so, does the application code generate these IDs or is it done on the DynamoDB side?
Good question - while conceptually possible, this seems not currently available as a DynamoDB API level feature, insofar neither CreateTable nor PutItem refer to such a functionality.
The #DynamoDBAutoGeneratedKey notation you have noticed is a Java annotation, i.e. syntactic sugar offered by the Java SDK indeed:
An annotation, in the Java computer programming language, is a special
form of syntactic metadata that can be added to Java source code.
As such #DynamoDBAutoGeneratedKey is one of the Amazon DynamoDB Annotations offered as part of the Object Persistence Model within the Java SDK's high-level API (see Using the Object Persistence Model with Amazon DynamoDB):
Marks a hash key or range key property as being auto-generated. The
Object Persistence Model will generate a random UUID when saving these
attributes. Only String properties can be marked as auto-generated
keys.
While working with dynamodb in javascript with nodejs. I use the npm module uuid to genrate unique key.
Ex:
id=uuid.v1();
refer :uuid npm
By using schema based AWS dynamodb data mapper library in Node.js, Hash key (id) will be generated automatically. Auto generated ids are based on uuid v4.
For more details, have a look on following aws package.
Data Mapper with annotation
Data Mapper package for Javascript
Sample snipet
#table('my_table')
class MyDomainClass {
#autoGeneratedHashKey()
id: string;
#rangeKey({defaultProvider: () => new Date()})
createdAt: Date;
}
The client can create a (for all intents and purposes) unique ID either by picking a long random id (DynamoDB supports 128-bit integers, for example), or by picking an ID which contains the client's IP address, CPU number, and current time - or something along these lines.
The UUID standard even includes a standard way to do this (and you have libraries in various languages to create such UUIDs on the client side), but you don't really need to use a standard.
And interesting question is how do you plan to find these items if they have random keys. Or are you planning to use a secondary index?
The 2022 answer is here:
https://dev.to/prabusah_53/aws-lambda-in-built-uuid-382f
External libraries are no longer needed.
Here is another good method taken from mkyong
http://www.mkyong.com/java/how-to-get-current-timestamps-in-java/
I adjusted his method to get the milliseconds instead of the actual date
java.util.Date date= new java.util.Date();
System.out.println(new Timestamp(date.getTime()).getTime());
The approach I'm taking is to use the current timestamp for the hash-key (or the range-key, if using a range-key too). Store the timestamp as an integer, representing the number of milliseconds since the start of the "UNIX epoch" (in the UTC timezone). Many date/time libraries can produce this number for you.
This has the advantage that if you want to have a "creation time" field in your table, your UUID already stores this information. Just call another method in your date/time library to convert the timestamp to a readable format.
(Be sure to handle the exception which will occur if a second item is created in the same table with the same millisecond timestamp; just fall back and retry the operation in that case, with a slightly later, current timestamp.)
For example:
User table
hash-key only: userID (timestamp of the creation of this user).
WidgetAttributes table
hash-key plus range-key.
hash-key: userID (use the userID from the User table of the user to whom the widget belongs).
range-key: attribID (use the timestamp of the creation of this widget-attribute).
Now you can run "query" operations on the WidgetAttributes table to get all widget-attributes for a certain user; by using "greater-than-zero" as the query-parameter for the range-key.