Is this a case of Non-Repeatable Reads? - concurrency

I have the following tables (here modeled in Entity Framework, but my question has nothing to do with EF):
As you can see, this is a versioned Product table. The Id column is the primary key, but the combination (EntityId, VersionId) could be the primary key as well. EntityId denotes the Id of the entity which is constant between different versions of the entity. The entity gets deleted by writing a row with IsDeleted = 1.
The stored procedures that are responsible for data manipulation first check if the data operation is okay. For example, the UPDATE SP checks if the entity has already been deleted. If those checks succeed, the SPs generate a new row in the Version table, followed by a new row in the Product table:
(pseudo-code):
sp_Product_Update:
(1) IF EXISTS (SELECT Id FROM Product WHERE IsDeleted = 1 AND EntityId = #ProductId)
RAISERROR "Entity has already been deleted"
RETURN
(2) INSERT INTO Versions ...
(3) INSERT INTO Product ... (IsDeleted = 0)
sp_Product_Delete:
(1) IF EXISTS (SELECT Id FROM Product WHERE IsDeleted = 1 AND EntityId = #ProductId)
RAISERROR "Entity has already been deleted"
RETURN
(2) INSERT INTO Versions ...
(3) INSERT INTO Product ... (IsDeleted = 1)
This works all well.
Currently, I'm analyzing this for concurrency issues. Imagine the following concurrency scenario, where two SPs are invoked at the same time, for the same entity:
Transaction 1 Transaction 2
sp_Product_Update sp_Product_Delete
(1) Check succeeds, entity has not yet been deleted.
(1) Same check.
(2) INSERT INTO Versions...
(3) INSERT INTO Product.. (IsDeleted = 1)
(2) INSERT INTO Versions...
(3) INSERT INTO Product ... (IsDeleted = 0)
As you can see, this race condition leads to inconsistent data, namely an IsDeleted = 0 row which comes after an IsDeleted = 1 entry.
So we have to determine what level of isolation we need to avoid this race condition.
This doesn't appear to be a Dirty Read, since the data that is read in (1) is not dirty.
It's no Non-Repeatable Read either, there are no two reads in either transaction.
Same goes for Phantom Read, there are no two queries in either transaction.
So I'm left with two questions:
Is my analysis correct?
What isolation level is needed to avoid this kind of issue?

Your solution requires serializable isolation level as all commands need to be executed together as one atomic operation.
If you would not use stored procedures, I would encourage to use optimistic locking which is designed for this kind of situation with high throughput.

Related

Condition check and Put on different tables in one DDB network call

Here are my tables:
Table1
Id (String, composite PK partition key)
IdTwo (String, composite PK sort key)
Table2
IdTwo (String, simple PK partition key)
Timestamp (Number)
I want to PutItem in Table1 only if IdTwo does not exist in Table2 or the item in Table2 with the same IdTwo has Timestamp less than the current time (can be given as outside input).
The simple approach I know would work is:
GetItem on Table2 with ConsistentRead=true. If item exists or its Timestamp < current time, exit early.
PutItem on Table1.
However, this is two network calls to DDB. I'd prefer optimizing it, like using TransactWriteItems which is one network call. Is it possible for my use case?
If you want to share code, I'd prefer Go, but any language is fine.
First off, the operation you're looking for is TransactWriteItems - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html
This is the API operation that lets you do atomic and transactional conditional writing operations. There's two parts to your question, not sure they can be done together—but then they might not need to be.
The first part, insert in table1 if condition is met in table2 is simple enough—you add the item you want in table1 in the Put section of the API call, and phrase the existence check for table2 in the ConditionCheck section.
You can't do multiple checks right now, so the check to see if the timestamp is lower than current time is another separate operation, also in the ConditionCheck. You can't combine them together or do just one because of your rules.
I'd suggest doing a bit of optimistic concurrency here. Try the TransactWriteItems with the second ConditionCheck, where the write will succeed only if the timestamp is less than current time. This is what should happen in most cases. If the transaction fails, now you need to check if it failed because the timestamp was lower or because the item doesn't yet exist.
If it doesn't yet exist, then do a TransactWiteItems where you populate the timestamp with a ConditionCheck to make sure it doesn't exist (another thread might have written it in the meantime) and then retry the first operation.
You basically want to keep retrying the first operation (write with condition check to make sure timestamp is lower) until it succeeds or fails for a good reason. If it fails because the data is uninitialized, initizalize it taking into account race conditions and then try again.

Short incremental uinque id for neo4j

I use django with neo4j as database. I need to use short url based on node ids in my rest api. In neo4j there is an id used in database that didn't recommended to use in app, and there is approach to use uuid that is too long for my short urls. So I add my uid generator:
def uid_generator():
last_id = db.cypher_query("MATCH (n) RETURN count(*) AS lastId")[0][0][0]
if last_id is None:
last_id = 0
last_id = str(last_id)
hash = sha256()
hash.update(str(time.time()).encode())
return hash.hexdigest()[0:(max(2, len(last_id)))] + str(uuid.uuid4()).replace('-', '')[0:(max(2, len(last_id)))]
I have two question, First I read this question in stack overflow and still not sure that MATCH (n) RETURN count(*) AS lastId is O(1) there was no reference to that! Is there any reference for that answer? Second is there a better approach to do in both id uniqueness and speed?
First, you should put a unique constraint on the id property to make sure there are no collisions created by parallel create statements. This requires using a label, but you NEED this fail-safe if you plan to do anything serious with this data. But this way, you can have rolling ids for different labels. (All indexed labels will have a count table. UNIQUE CONSTRAINT also creates an index)
Second, you should do the generation and creation in the same cypher like this
MATCH (n:Node) WITH count(*) AS lastId
CREATE (:Node{id:lastId})
This will minimize time between generation and commit, reducing chances of collision. (Remember to retry on failed attempts from unique violations)
I'm not sure what you are doing with the hash, just that you are doing it wrong. Either you generate a new time based UUID (It will require no parameters) and use it as is, or you use the incriminating id. (By altering a UUID, you invalidate the logic that guaranteed uniqueness, thus significantly increasing collision chance)
You can also store the current index count in a node like is explained here. It's not guaranteed to be thread safe, but shouldn't be a problem as long as you have Unique Constraints in place, and retry on constraint violations. This will be more tolerant of deleting nodes.
Your approach is not good because it's based on the number of node in the database.
What happened if you create a node (call it A), and then delete a random node, and then create a new node (call it B).
A and B will have the same ID, and I think that's why you have added a hash in code based on the time (but I barely understand the line :)).
On the other side, Neo4j's ID ensure you to have a unique ID across the database, but not in the time. Per default, Neo4j recycle unused ID (an ID is release when a node is deleted).
You can change this behavour by changing the configuration (see the doc HERE ) : dbms.ids.reuse.types.override=RELATIONSHIP
Becarefull with such a configuration, the size of your database on your harddrive can only increase, even if you delete nodes.
Why not create your own identifier? You can get the maximum of your last identifier (let's call it RN for record number).
match (n) return max(n.RN) as lastID
max is one of several numeric functions in cypher.

Entity Framework DB First: Timestamp column not working

Using db first approach, I want my application to throw a concurrency exception whenever I try to update an (out-of-date) entity which it's correspoinding row in the database has been already updated by another application/user/session.
I am using Entity Framework 5 on .Net 4.5. The corresponding table has a Timestamp column to maintain row version.
I have done this in the past by adding a timestamp field to the table you wish to perform a concurrency check. (in my example i added a column called ConcurrencyCheck)
There are two types of concurrency mode here depending on your needs :
1 Concurrency Mode: Fixed :
Then re-add/refresh your table in your model. For fixed concurrency , make sure your set your concurrency mode to fixed for your table when you import it into your model : like this :
Then to trap this :
try
{
context.SaveChanges();
}
catch (OptimisticConcurrencyException ex) {
////handle your exception here...
2. Concurrency Mode: None
If you wish to handle your own concurrency checking , i.e. raise a validation informing the user and not even allowing a save to occur then you can set Concurrency mode None.
1.Ensure you change the ConcurrencyMode in the properties of the new column you just added to "None".
2. To use this in your code , i would create a variable to store your current timestamp on the screen you which to check a save on.
private byte[] CurrentRecordTimestamp
{
get
{
return (byte[])Session["currentRecordTimestamp"];
}
set
{
Session["currentRecordTimestamp"] = value;
}
}
1.On page load (assuming you're using asp.net and not mvc/razor you dont mention above), or when you populate the screen with the data you wish you edit , i would pull out the current record under edit's ConcurrencyCheck value into this variable you created.
this.CurrentRecordTimestamp = currentAccount.ConcurrencyCheck;
Then if the user leaves the record open , and someone else in the meantime changes it , and then they also attempt to save , you can compare this timestamp value you saved earlier with the concurrency value it is now.
if (Convert.ToBase64String(accountDetails.ConcurrencyCheck) != Convert.ToBase64String(this.CurrentRecordTimestamp))
{
}
After reviewing many posts here and on the web explaining concurrency and timestamp in Entity Framework 5, I came into the conclusion that basically it is impossible to get a concurrency exception when the model is generated from an existing database.
One workaround is modifying the generated entities in the .edmx file and setting the "Concurrency Mode" of the entity's timestamp property to "Fixed". Unfortunately, if the model is repeatedly re-generated from the database this modification may be lost.
However, there is one tricky workaround:
Initialize a transaction scope with isolation level of Repeatable Read or higher
Get the timestamp of the row
Compare the new timestamp with the old one
Not equal --> Exception
Equal --> Commit the transaction
The isolation level is important to prevent concurrent modifications of inferring.
PS:
Erikset's solution seems to be fine to overcome regenerating the model file.
EF detects a concurrency conflict if no rows were affected. Then if you use stored procedures to delete and update you could manually add the timestamp value in the where clause:
UPDATE | DELETE ... WHERE PKfield = PkValue and Rowversionfield = rowVersionValue
Then if the row has been deleted or modified by anyone else the Sql statement affects 0 rows and EF interpret it as concurrency conflict.

Sync Framework Deletes not being applied on client

Heres my scenario: I was concerned about the SqlCeSyncClient applying deletes, then inserts, then updates. I have cases where a row may be de-referenced from another table, and then deleted. For example, imagine this:
I have two tables; Customer, and Area, of which Customer.Area references Area.Name with a Foreign key constraint
insert into Area values('Australia')
insert into Customer values('customer1','Australia')
-- Sync happens. Client gets 2 inserts.
update Customer set Area = 'New Zealand' where Area = 'Australia'
delete from Area where Name = 'Australia'
-- Sync happens. Client gets 1 update , and 1 delete
The SqlCeClientSyncProvider tries to apply the delete first, which it fails to do because of referential integrity constraints on the client.
My first question is: Why on earth did the boys at Microsoft code the SyncClient to process deletes FIRST when it breaks all referential integrity rules? Shouldn't they apply deletes LAST????
My next question is: I have managed to reverse the order by inspecting the code and writing the whole ApplyChanges method myself... but even when I do that the deletes are not applied. Is there some internal thing with datasets that means you can't change the order of processing?
The problem is not the order from operations ( delete, update, inserts, ...), but the order you placed your synctables...
You should have synced Area table first and after Customer table.

Comparing entities while unit testing with Hibernate

I am running JUnit tests using in memory HSQLDB. Let's say I have a method that inserts some values to the DB and I am checking if the method inserted the values correctly. Note that order of the insertion is not important.
#Test
public void should_insert_correctly() {
MyEntity[] expectedEntities = new MyEntity[2];
// init expected entities
Inserter out = new Inserter(session); // out: object under test
out.insert();
List list = session.createCriteria(MyEntity.class).list();
assertTrue(list.contains(expectedEntities[0]));
assertTrue(list.contains(expectedEntities[1]));
}
The problem is I cannot compare expected entities to actual ones because the expected's id and the actual's id are different. Since setId() of MyEntity is private (to prevent setting id explicitly), I cannot set all of the entities' id to 0 and compare like that.
How can I compare two result set regardless of their ids?
I found this more practical. Instead of fetching all results at once, I am fetching results according to the criterias and asserting they are not null.
public void should_insert_correctly() {
Inserter out = new Inserter(session); // out: object under test
out.insert();
Criteria criteria;
criteria = getCriteria(session, 0);
assertNotNull(criteria.uniqueResult());
criteria = getCriteria(session, 1);
assertNotNull(criteria.uniqueResult());
}
private Criteria getCriteria(Session session, int i) {
Criteria criteria = session.createCriteria(MyEntity.class);
criteria.add(Restrictions.eq("x", expectedX[i]));
criteria.add(Restrictions.eq("y", expectedY[i]));
return criteria;
}
A stateful entity should not override equals -- that is, entities should be compared for equality by reference identity -- so List.contains will not work as you want.
What I do is use reflection to compare the fields of the original and reloaded entities. The function that walks over the fields of the objects ignores transient fields and those annotated as #Transient.
I don't find I need to ignore the id. When the object is first flushed to the database, Hibernate allocates it an id. When it is reloaded, the object will have the same id.
The flaw in your test is that you have not set transaction boundaries. You need to save the objects in one transaction. When you commit that transaction, Hibernate will flush the objects to the database and allocate their ids. Then in another transaction load the entities back from the database. You will get another set of objects that should have the same ids and persistent (i.e. non-transient) state.
I would try to implement Object.equals(Object) method in your MyEntity class.
List.contains(Object) uses Object.equals(Object) (Source: Java 6 API) to determine if an Object is in this list.
The method session.createCriteria(MyEntity.class).list(); returns a list of new instances with the values you inserted (hopefully).
So you need to compare the values. This is easily done via the implementation of Object.equals(Object).
Clarification edit:
You could ignore the ids in your equals method, so that the comparison only cares about "real values".
YAE (Yet Another Edit):
I recommend reading this article about the equals() method: Angelika Langer: Secrets Of Equal. It explains all background information very well.