I am currently adding unit tests to a rather large quantity of PostgreSQL stored procedures, using pgTap.
Some of the procedures perform operations which lock rows explicitly. These locks are critical to the application.
How do I write tests that check that the rows that need to be locked have been, and that rows which shouldn't be locked aren't?
The only "clue" I have at the moment is the pgrowlocks extension, which allows a transaction to check for rows locked by another transaction. However, the current transaction doesn't seem to see its own locks, so I'd have to use something to synchronise two transaction, and unless I am quite mistaken, there's no way to do that using pgTap.
(note: using PostgreSQL 9.1)
If you can identify the ctid of the rows in question, and know which transaction should have the rows locked, maybe you could use the pageinspect extension and look at the tuple info flags and xmax? The info flags should indicate the row is locked, and xmax be set to the transaction id holding it.
How do I write tests that check that the rows that need to be locked have been, and that rows which shouldn't be locked aren't?
Open separate transation, try to lock the same row with NOWAIT, and catch the exception.
PostgreSQL has no support for autonomous transactions, so - to open separate transaction from within PgTAP test, you will have to resort to dblink or other similar extension.
PS. I found this link, where Robert Haas explains, why row-level tuples are not tracked in pg_locks:
(...) ungranted tuple locks show up in
pg_locks, but they disappear once granted. (PostgreSQL would run out
of lock table space on even a medium-sized SELECT FOR UPDATE query if
we didn't do this.)
On the other hand - I quite don't understand why you want to test for lock existence - it's guaranteed after succesful LOCK command.
Related
I'm having trouble updating a single item many times at once. If I try to update an item with new attributes many times like so:
UpdateExpression: 'SET attribute.#uniqueId = :newAttribute'
not all of the updates go through. I tried sending 20 updates with unique ids and this resulted in only 15 new attributes. This also occurs in my local dynamodb instance. I assume that the updates are somehow overwriting each other in a "last update wins" scenario but I'm not sure. How can I solve this?
DynamoDB is eventually consistent on update, so "race conditions" are possible. If you want more strict logic in writes, take a look at transactions
Items are not locked during a transaction. DynamoDB transactions
provide serializable isolation. If an item is modified outside of a
transaction while the transaction is in progress, the transaction is
canceled and an exception is thrown with details about which item or
items caused the exception.
Your observation is very interesting, and contradicts observations made in the past in Are DynamoDB "set" values CDRTs? and Concurrent updates in DynamoDB, are there any guarantees? - in those issues people observed that concurrent writes to different set items or to different top-level attributes seem to not get overwritten. Neither case is exactly the same as what you tested (nested attributes), though, so it's not a definitive proof there was something wrong with your test, but it's still surprising.
Presentations made in the past by the DynamoDB developers suggested that in DynamoDB writes happen on a single node (the designated "leader" of the partition), and that this node can serialize the concurrent writes. This serialization is needed to allow conditional updates, counter increments, etc., to work safely with concurrent writes. Presumably, the same serialization could have also allowed multiple sub-attributes to be modified concurrently safely. If it doesn't, it might mean that this serialization is deliberately disabled for certain updates, perhaps all unconditional updates (without a ConditionExpression). This is very surprising, and should have been documented by Amazon...
I want to use different Berkeley-DB databases to store different classes of objects in my application. Transactions in a single DB can be done atomically using DbTxn::commit. However, if I'm using multiple databases, I have to create multiple transactions (one for each database), right? In this case, if committing the first succeeds but the second fails, is there a way to roll back the already committed first transaction? (As far as I understand DbTxn::abort, this is no longer possible to use after the transaction has been committed.)
Is there some way to achieve atomic transactions across multiple databases?
If you are using multiple databases then you DON'T have to create multiple transactions. By using a single transaction, you can operate on multiple DBs.
Please see this link for the documentation of Db::Open().
It has the 'DbTxn *txnid' parameter. You can specify a transaction id returned by the DB_ENV->txn_begin() API. So before opening a DB, a transaction id should be obtained.
Carefully read the note under the parameter 'txnid' in the given documentation link.
Please note that, you should Not specify DB_AUTO_COMMIT the flag in the Db::open() API. Instead of that, you will specify the same transaction id for the parameter 'txnid' for all DBs that you want to operate on. In this way you can achieve atomic transactions across multiple databases.
In general, you need something like distributed tranzaction manager, the full answer fills books. See "The Berkeley DB Book", CHAPTER 9, "Distributed Transactions and Data-Distribution Strategies", ISBN-10: 1-59059-672-2
I have an update query that is based on the result of a select, typically returning more than 1000 rows.
If some of these rows are updated by other queries before this update can touch them could that cause a problem with the records? For example could they get out of sync with the original query?
If so would it be better to select and update individual rows rather than in batch?
If it makes a difference, the query is being run on Microsoft SQL Server 2008 R2
Thanks.
No.
A Table cannot be updated while something else is in the process of updating it.
Databases use concurrency control and have ACID properties to prevent exactly this type of problem.
I would recommend reading up on isolation levels. The default in SQL Server is READ COMMITTED, which means that other transactions cannot read data that has been updated but not committed by a given transaction.
This means that data returned by your select/update statement will be an accurate reflection of the database at a moment in time.
If you were to change your database to READ UNCOMMITTED then you could get into a situation where the data from your select/update is out of synch.
If you're selecting first, then updating, you can use a transaction
BEGIN TRAN
-- your select WITHOUT LOCKING HINT
-- your update based upon select
COMMIT TRAN
However, if you're updating directly from a select, then, no need to worry about it. A single transaction is implied.
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot
BUT... do NOT do the following, otherwise you'll run into a deadlock
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot WITH (NOLOCK)
I am developing a system using Django + Postgresql. It's my first time with postgresql but I chose it because I needed the transactions and foreign key features.
In a certain view I have to lock my tables with AccessExclusiveLock, to prevent any read or write during this view. That's because I do some checks on the whole data before I save/update my entities.
I noticed an inconsistent error that happens from time to time. It's because of a select statement that happens directly after the lock statement. It demands to have AccessShareLock. I read on postgresql website that AccessShareLock conflicts with AccessExclusiveLock.
What I can't understand is why is it happening in the first place. Why would postgresql ask for implicit lock if it already has an explicit lock that cover that implicit one ? The second thing I can't understand is why is this view running on 2 different postregsql processes ? Aren't they supposed to be collected in a single transaction ?
Thanx in advance.
In PostgreSQL, instead of acquiring exclusive access locks, I would recommend to set the appropriate transaction isolation level on your session. So, before running your "update", send the following command to your database:
begin;
set transaction isolation level repeatable read;
-- your SQL commands here
commit;
According to your description, you need repeatable read isolation level.
I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).