Where/when to check the version of the row when doing optimistic locking? [duplicate] - optimistic-locking

This question already has answers here:
Optimistic vs. Pessimistic locking
(13 answers)
Closed 6 months ago.
I want to implement optimistic locking for a relational database.
Suppose there is a users table
id
name
version
1
Jhon
1
2
Jhane
1
My application fetches the Jhon user to change his name
SELECT id, name, version FROM users WHERE id = 1;
jhon = get_user_by_id('1');
jhon.change_name_to('Jhin');
jhon.save() // this method should fail or succeed depending on the version of the row in the database
So where do I need to compare the version of the selected row with the version of the row that is in the database?
Is a database transaction a good place to fetch the existing version of the row and compare it with the already fetched record?
transaction_begin()
jhon = get_user_by_id('1')
if (jhon.version !== updated_jhon.version) { // Ensures that version match
// If no rollback
transaction_rollback();
} else {
// If yes, update and commit
query("UPDATE table SET name = {updated_jhon.name}, SET version = {jhon.version + 1} WHERE id = 1;")
}
transaction_commit()

I found an answer in a similar asked question
How does Hibernate do row version check for Optimistic Locking before committing the transaction
The answer would be not to read a version at all.
Optimistic locking does not require any extra SELECT to get and check the version after the entity was modified
In order to update a record (user), we also need to pass a version
UPDATE users SET name = Jhin WHERE id = 1 AND VERSION = 1;
If the number of affected records is greater than 0, it means that the row was not affected by someone else during the name update.
If the number of affected records is equal to 0. It means that someone else has changed the row during our modification.

Related

MariaDB: multiple table update does not update a single row multiple times? Why?

Today I was just bitten in the rear end by something I didn't expect. Here's a little script to reproduce the issue:
create temporary table aaa_state(id int, amount int);
create temporary table aaa_changes(id int, delta int);
insert into aaa_state(id, amount) values (1, 0);
insert into aaa_changes(id, delta) values (1, 5), (1, 7);
update aaa_changes c join aaa_state s on (c.id=s.id) set s.amount=s.amount+c.delta;
select * from aaa_state;
The final result in the aaa_state table is:
ID
Amount
1
5
Whereas I would expect it to be:
ID
Amount
1
12
What gives? I checked the docs but cannot find anything that would hint at this behavior. Is this a bug that I should report, or is this by design?
The behavior you are seeing is consistent with two updates happening on the aaa_state table. One update is assigning the amount to 7, and then this amount is being clobbered by the second update, which sets to 5. This could be explained by MySQL using a snapshot of the aaa_state table to fetch the amount for each step of the update. If true, the actual steps would look something like this:
1. join the two tables
2. update the amount using the "first" row from the changes table.
now the cached result for the amount is 7, but this value will not actually
be written out to the underlying table until AFTER the entire update
3. update the amount using the "second" row from the changes table.
now the cached amount is 5
5. the update is over, write 5 out for the actual amount
Your syntax is not really correct for what you want to do. You should be using something like the following:
UPDATE aaa_state as
INNER JOIN
(
SELECT id, SUM(delta) AS delta_sum
FROM aaa_changes
GROUP BY id
) ac
ON ac.id = as.id
SET
as.amount = as.amount + ac.delta_sum;
Here we are doing a proper aggregation of the delta values for each id in a separate bona-fide subquery. This means that the delta sums will be properly computed and materialized in the subquery before MySQL does the join, to update the first table.

Cassandra CQL - update (insert) if not equal to

I have a scenario where I need to update (or insert) a record if a (non-key) field is not equal to some string OR the record does not exist. For example, given something like:
UPDATE mytable SET firstname='John', lastname='Doe' WHERE id='1' IF lastname != 'Doe';
If the lastname is not currently 'Doe', then update it, or if the record does not exist, update (insert) it. My assumption was that the IF condition would yield true if there was no record, but apparently not. Is there an alternative?
In Cassandra an UPDATE behaves very similar as the INSERT statement, as explained in the Apache CQL Documenation:
"Note that unlike in SQL, UPDATE does not check the prior existence of the row by default (except through IF, see below): the row is created if none existed before, and updated otherwise. Furthermore, there are no means to know whether a creation or update occurred." - CQL Documentation - Update
I did a simple test and it did work:
cqlsh:test_keyspace> select * from conditional_updating ;
id | firstname | lastname
----+-----------+----------
(0 rows)
cqlsh:test_keyspace> update conditional_updating
set firstname = 'John',
lastname = 'Doe'
WHERE id = 1 IF lastname != 'Doe';
[applied]
-----------
True
cqlsh:test_keyspace> select * from conditional_updating ;
id | firstname | lastname
----+-----------+----------
1 | John | Doe
(1 rows)
cqlsh:test_keyspace> update conditional_updating
set lastname = 'New'
WHERE id = 1 IF lastname != 'Doe';
[applied] | lastname
-----------+----------
False | Doe
Note that using the IF condition isn't free Under the hood it triggers a lightweight transaction (LWT) (also known as CAS for Compara and SET). Such queries require a read and a write and they also need to reach consensus among all replicas, which makes it a bit onerous.
"But, please note that using IF conditions will incur a non-negligible performance cost (internally, Paxos will be used) so this should be used sparingly." - CQL Documentation - Update
If you are interested in knowing why Lightweight transactions are considered an anti-pattern in Cassandra I encourage you to have a look here: Lightweight Transactions In Cassandra
Please refer to this documentation as this is what you need and in Cassandra UPDAtE Query act as an insert if not exists.
update with condition
Example:
UPDATE keyspace_name.table_name
USING option AND option
SET assignment, assignment, ...
WHERE row_specification
IF column_name = literal AND column_name = literal . . .
IF EXISTS

InnoDB locking for INSERT/UPDATE concurrent transactions

I'm looking to ensure isolation when multiple transactions may execute a database insert or update, where the old value is required for the process.
Here is a MVP in python-like pseudo code, the default isolation level is assumed:
sql('BEGIN')
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
if rows:
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
else:
sql('INSERT INTO table (`id`, `value`) VALUES (<id>, <new_value>)')
sql('COMMIT')
The issue with this is that FOR UPDATE leads to an IS lock, which does not prevent two transactions to proceed. This results in a deadlock when both transaction attempt to UPDATE or INSERT.
Another way to do is first try to insert, and update if there is a duplicated key:
sql('BEGIN')
rows_changed = sql('INSERT IGNORE INTO table (`id`, `value`) VALUES (<id>, <new_value>)')
if rows_changed == 0:
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
sql('COMMIT')
The issue in this solution is that a failed INSERT leads to an S lock, which does not prevent two transaction to proceed as well, as described here: https://stackoverflow.com/a/31184293/710358.
Of course any solution requiring hardcoded wait or locking the entire table is not satisfying for production environments.
A hack to solve this issue is to use INSERT ... ON DUPLICATE KEY UPDATE ... which always issues an X lock. Since you need the old value, you can perform a blank update and proceed as in your second solution:
sql('BEGIN')
rows_changed = sql('INSERT INTO table (`id`, `value`) VALUES (<id>, <new_value>) ON DUPLICATE KEY UPDATE `value`=`value`')
if rows_changed == 0:
rows = sql('SELECT `value` FROM table WHERE `id`=<id> FOR UPDATE')
old_value, = rows[0]
process(old_value, new_value)
sql('UPDATE table SET `value`=<new_value> WHERE `id`=<id>')
sql('COMMIT')

How to fetch last inserted record for particular id?

Apologies, I am completely new to Django. My question is that I have 20 records in my database table and suppose 10 record is of same ID and I want to fetch last inserted record for that id I have date column in my table. How can I do that?
last_obj = YourModel.objects.last()
But generally, you can't create > 1 objects with same id, if you didn't specified your own id field to replace built-in. And even then, it's a bad idea.

How can I index a tree in a SQLite Table?

I have a table with the following fields:
id VARCHAR(32) PRIMARY KEY,
parent VARCHAR(32),
name VARCHAR(32)
parent is a foreign key referencing the same table. This structure generates a tree. This tree is supposed to replicate a filesystem tree. The problem is that looking up an id from a path is slooow. Therefore I want to build an index. What is the best way of doing this?
Example Data:
id parent name
-------- ----------- ----------
1 NULL root
2 1 foo
3 1 bar
4 3 baz
5 4 aii
Would Index To:
id parent name
-------- ----------- ----------
1 NULL root
2 1 root/foo
3 1 root/bar
4 3 root/bar/baz
5 4 root/bar/baz/aii
I am currently thinking about using a temporary table and in the code manually running a series of insert from's to build the index. (The reason that I make it temporary is that if this db is accessed from a windows system, the path needs backslashes whereas from *nix it needs forward slashes). Is there an alternative to this?
so, you have a function that does something like this (pseudocode):
int getId(char **path) {
int id = 0;
int i;
char query[1000];
for (i = 0; path[i] != NULL; i++) {
sprintf(query, "select id from table where parent = %d and name = '%s'", id, name);
id = QuerySimple(query);
if (id < 0)
break;
}
return id;
}
looking at the query, you need a (non unique) index on columns (parent, name), but maybe you already have it.
a (temporary) table can be used like you said, note that you can change the path separator in your program, avoiding the needed of different tables for windows and unix. you also need to keep the additional table in sync with the master. if updates/deletes/inserts are rare, instead of a table, you can simply keep an in-memory cache of already looked up data, and clear the cache when an update happens (you can also do partial deletes on the cache if you want). In this case you can also read more data (e.g. given a parent read all the children) to fill up the cache faster. On the extreme, you can read the entire table in memory and work there! it depends on how many data you have, your typical access patterns (how many reads, how many writes), and the deployment environment (do you have RAM to spare?).