how DB2 transaction works exactly in concurrent situation - concurrency

here are 2 tables employee(empID int,empName varchar,age int) and salary(salID,empID,...).
insert into employee
select ...
where not exists (select 1 from employee where empID = :employeeID)
insert into salary ....
...
there are 2 jobs running concurrently with above logic and SAME employee.
job1 failed and rolled back.
job2 started before job1 rolled back and committed after job1 rolled back.
my questions are:
if job2 completed successfully,how many record will be inserted into employee table? 1 or 0?
more precisely, can step 1 of job2 see the employee record inserted by job1?
if 0, how can I make sure the employee is inserted by job2 or won't be removed by job1 rollback?

If job2 completed successfully, how many records will be inserted into employee table? 1 or 0?
1
More precisely, can step 1 of job2 see the employee record inserted by job1?
Since you said job1 is rolled back, there is no employee record inserted by job1.
job1 tells the database, "I want to insert an employee record." DB2 then puts a write lock on the row, page, or table. When the table was defined, the data base analyst (DBA) made the determination as to what level the table write locks. The default is page.
As long as the write lock is there, no other transaction can read the insert, unless the read isolation level is uncommitted read. We'll assume that's not the case here.
job2 has to wait for job1 to commit or rollback before db2 will allow job2 to see job1's row. After job1 rolls back, the write lock is removed, and job2 is allowed to proceed.
How can I make sure the employee is inserted by job2 or won't be removed by job1 rollback?
DB2 won't allow job2 to perform the insert until after job1 has been rolled back.
By the way, this sort of problem (two transactions trying to insert the same row) can cause a DB2 deadlock, where both transactions would be rolled back. Again, this assumes that the transactions (jobs) have a higher read isolation level than uncommitted read.

Related

Is it possible to run queries in parallel in Redshift?

I wanted to do an insert and update at the same time in Redshift. For this I am inserting the data into a temporary table, removing the updated entries from the original table and inserting all the new and updated entries. Since Redshift uses concurrency, sometimes entries are duplicated, because the delete started before the insert was finished. Using a very large sleep for each operation this does not happen, however the script is very slow. Is it possible to run queries in parallel in Redshift?
Hope someone can help me , thanks in advance!
You should read up on MVCC (multi-version coherency control) and transactions. Redshift can only only run one query at a time (for a session) but that is not the issue. You want to COMMIT both changes at the same time (COMMIT is the action that causes changes to be apparent to others). You do this by wrapping your SQL statement in a transaction (BEGIN ... COMMIT) and executed in the same session (not clear if you are using multiple sessions). All changes made within the transaction will only be visible to the session making the changes UNTIL COMMIT when ALL the changes made by the transaction will be visible to everyone at the same moment.
A few things to watch out for - if your connection is in AUTOCOMMIT mode then you may break out of your transaction early and COMMIT partial results. Also when you are working in transactions your source table information is unchanging (so you see consistent data during your transaction) and this information isn't allowed to change for you. This means that if you have multiple sessions changing table data you need to be careful about the order in which they COMMIT so the right version of data is presented to each other.
begin transaction;
<run the queries in parallel>
end transaction;
In this specific case do this:
create temp table stage (like target);
insert into stage
select * from source
where source.filter = 'filter_expression';
begin transaction;
delete from target
using stage
where target.primarykey = stage.primarykey;
insert into target
select * from stage;
end transaction;
drop table stage;
See:
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
https://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html

What rows acquire locks when doing a select query

For a select query, for what rows are read locks acquired? Is it only the rows that match the filter, or all rows that had to be scanned?
First, note that locks are only needed for read-write transactions but not for read-only transactions. (https://cloud.google.com/spanner/docs/reads)
Cloud Spanner will acquire locks on all the returned rows. It will also acquire enough extra locks to avoid “false negatives”, which are rows that aren’t returned because they initially don’t match the filter, but then are modified to match the filter before your transaction commits. These false negatives are often called “phantom rows:” you execute a query and get a set of a results, and then in the same transaction you execute the exact same query and get more rows. If the query plan does a scan over the base table, we will take a range lock on the whole table, so that no phantom rows can appear until your transaction completes. If the query plan uses an index to find rows with value ‘X’ for field ‘Y’, then we’ll lock a range of the index corresponding to all possible index entries for ‘Y=X’, so that if any transaction wants to insert a new index entry with ‘Y=X’ it would have to wait until your transaction completes.

Django, Insertion during schema migration

Sometimes schemamigration takes long time, e.g several fields are added/removed/edited. What happens if you try to make an insertion to a table while running a schema migration to change the structure of this table?
I'm aware the changes are not persistent until the entire migration is done.
That behavior depends on the underlying database and what the actual migration is doing. For example, PostgreSQL DDL operations are transactional; an insert to the table will block until a DDL transaction completes. For example, in one psql window, do something like this:
create table kvpair (id serial, key character varying (50), value character varying(100));
begin;
alter table kvpair add column rank integer;
At this point, do not commit the transaction. In another psql window, try:
insert into kvpair (key, value) values ('fruit', 'oranges');
You'll see it will block until the transaction in the other window is committed.
Admittedly, that's a contrived example - the granularity of what's locked will depend on the operation (DDL changes, indexing, DML updates). In addition, any statements that get submitted for execution may have assumed different constraints. For example, change the alter table statement above to include not null. On commit, the insert fails.
In my experience, it's always a good thing to consider the "compatibility" of schema changes, and minimize changes that will dramatically restructure large tables. Careful attention can help you minimize downtime, by performing schema changes that can occur on a running system.

Excessive TableStatus wait in dynamodb

i have some 200+ tables in my dynamodb. Since all my tables have localSecondaryIndexes defined, i have to ensure that no table is in the CREATING status, at the time of my CreateTable() call.
While adding a new table, i list all tables and iterate through their names, firing describeTable() calls one by one. On the returned data, i check for TableStatus key. Each describeTable() call takes a second. This implies an average of 3 minute waiting time before creation of each table. So if i have to create 50 new tables, it takes me around 4 hours.
How do i go about optimizing this? i think that a BatchGetItem() call works on stuff inside the table and not table-metadata. Can i do a bulk describeTable() call?
It is enough that you wait until the last table you created becomes ACTIVE. Run DescribeTable on that last created table with a few seconds interval.

DELETE blocking INSERT

OS : Solaris
Database : Informix
I have a process which has 2 threads:
Thread 1 dealing with new transactions and doing DB INSERTS
Thread 2 dealing with existing transactions and doing DB DELETES
PROBLEM
Thread 1 is continuously doing INSERTS(adding new transactions) on a table.
Thread 2 is continuously doing DELETES(removing expired transactions) from the same table based on primary key
INSERTS are failing because of Informix error 244 which are occurring due to page/table locking.
I guess, the DELETE is doing a Table lock instead of Row lock and preventing the INSERTs to work.
Is there any way I can prevent this deadlocking?
EDIT
I found another clue. The 244 error is caused by a SELECT query.
Both insert and delete operation does a select from a frequently updating table, before doing the operation.
Isolation is set as COMMITTED READ. When I manually do a select on this table from dbaccess, when the deletes are happening, I get the same error.
I would be very surprised if a DELETE was doing a full table lock when removing single elements by primary key. Rather, it is likely the longevity of one (or both) of the transactions themselves is eventually tripping a table lock due to the number of modified rows. In general, you can avoid deadlocks in volatile tables such as this by eliminating all but single-row operations in each transaction, and ensuring your transaction model is read-committed. At least thus has been my experience.