I'm creating a few simple helper classes and methods for working with libpq, and am wondering if I receive an error from the database - (e.g. SQL error), how should I handle it?
At the moment, each method returns a bool depending on whether the operation was a success, and so is up to the user to check before continuing with new operations.
However, after reading the libpq docs, if an error occurs the best I can come up with is that I should log the error message / status and otherwise ignore. For example, if the application is in the middle of a transaction, then I believe it can still continue (Postgresql won't cancel the transaction as far as I know).
Is there something I can do with PostgreSQL / libpq to make the consequences of such errors safe regarding the database server, or is ignorance the better policy?
You should examine the SQLSTATE in the error and make handling decisions based on that and that alone. Never try to make decisions in code based on the error message text.
An application should simply retry transactions for certain kinds of errors:
Serialization failures
Deadlock detection transaction aborts
For connection errors, you should reconnect then re-try the transaction.
Of course you want to set a limit on the number of retries, so you don't loop forever if the issue doesn't clear up.
Other kinds of errors aren't going to be resolved by trying again, so the app should report an error to the client. Syntax error? Unique violation? Check constraint violation? Running the statement again won't help.
There is a list of error codes in the documentation but the docs don't explain much about each error, but the preamble is quite informative.
On a side note: One trap to avoid falling into is "testing" connections with a trivial query before using them, and assuming that means the real query can't fail. That's a race condition. Don't bother testing connections; simply run the real query and handle any error.
The details of what exactly to do depend on the error and on the application. If there was a single always-right answer, libpq would already do it for you.
My suggestions:
Always keep a record of the transaction until you've got a confirmed commit from the DB, in case you have to re-run. Don't just fire-and-forget SQL statements.
Retry the transaction without a disconnect and reconnect for SQLSTATEs 40001 (serialization_failure) and 40P01 (deadlock_detected), as these are transient conditions generally resolved by re-trying. You should log them, as they're opportunities to improve how the app interacts with the DB and if they happen a lot they're a performance problem.
Disconnect, reconnect, and retry the transaction at least once for error class 08 (connection exceptions).
Handle 53300 (too_many_connections) and 53400 (connection limit exceeded) with specific and informative errors to the user. Same with the other 53 class entries.
Handle class 57's entries with specific and informative errors to the user. Do not retry if you get a query_cancelled (57014), it'll make sysadmins very angry.
Handle 25006 (read_only_sql_transaction) by reporting a different error, telling the user you tried to write to a read-only database or using a read-only transaction.
Report a different error for 23505 (UNIQUE violation), indicating that there's a conflict in a unique constraint or primary key constraint. There's no point retrying.
Error class 01 should never produce an exception.
Treat other cases as errors and report them to the caller, with details from the problem - most importantly SQLSTATE. Log all the details if you return a simplified error.
Hope that's useful.
Related
I'm unclear on the exact behaviour of Django in the face of database serialization errors in transactions.
The docs transaction.atomic() docs don't specify this behaviour as far as I can tell.
If the DB hits a consistency error while committing a transaction (e.g. another transaction updated a value that was read in the current transaction), reading django.db.transaction.py, it looks like the transaction will rollback, and the DatabaseError will be raised to the calling code (e.g. the transaction.atomic() context manager). Is this correct?
And, more importantly, are there cases when the transaction could be rolled back without the transaction.atomic wrapper receiving an exception?
(Note that I'm not asking about DatabaseErrors that are raised inside the context manager, as the docs clearly explain what happens to them. I'm asking only about database errors which occur during the commit of the transaction, which occurs on exit of the context manager.)
If the DB hits a consistency error while committing a transaction ... it looks like the transaction will rollback, and the DatabaseError will be raised to the calling code (e.g. the transaction.atomic() context manager). Is this correct?
Yes, precisely.
Are there cases when the transaction could be rolled back without the transaction.atomic wrapper receiving an exception?
No. You can verify this from the code inside transaction.py where the only time a rollback is initiated is if DatabaseError is thrown. This is also confirmed in the documentation that you link to:
When exiting an atomic block, Django looks at whether it’s exited normally or with an exception to determine whether to commit or roll back.
I have in working thread, which runs forever, connection to postgresql database ( in c++ using libpqxx)
#include <pqxx/connection>
// later in code on starting thread only once executed
connection* conn= conn = new connection(createConnectionString(this->database, this->port, this->username, this->password));
How to check if connection is still active couple hours later, for example if I in meanwhile restrt postgre server ( worker thread still running and is not restarted) I should when I try to execute new query check if it is still valid and reconnect if it not.
How to know if it is still alive ?
How to check if connection is still active couple hours later
Don't.
Just use it, as if it were alive. If an exception is thrown because there's something wrong, catch the exception and retry the transaction that failed from be beginning.
Attempts to "test" connections or "validate" them are doomed. There is an inherent race condition where the connection could go away between validation and actually being used. So you have to handle exceptions correctly anyway - at which point there's no point doing that connection validation in the first place.
Queries can fail and transactions can be aborted for many reasons. So your app must always execute transactions in a retry loop that detects possibly transient failure conditions. This is just one possibility - you could also have a query cancelled by the admin, a transaction aborted by the deadlock detector, a transaction cancelled by a serialization failure, etc.
To avoid unwanted low level TCP timeouts you can set a TCP keepalive on the connection, server-side or client-side.
If you really insist on doing this, knowing that it's wrong, just issue an empty query, i.e. "".
I have to rollback the changes done to the database depending on some condition, but up to that 'some condition' the changes should be reflected in the database for other users.
#transaction.atomic
def populate_db(input):
Object = Table.objects.select_for_update().get(attributeX=input)
Object.attributeY = False
Object.save()
** some operation here **
Problem I'm facing is, the value of attributeY is not getting stored in the database until the whole function is executed successfully, but what i want is changed value of attributeY should be reflected in database until some operation fails.
And I cannot get to know whether some operation is failed or not, because the failures I'm trying to handle here are closing browser accidentally, power outage kind of things.
Any help is appreciated, thanks !
So what would populate_db see that indicates the transaction did not complete?
For example, the seat has been reserved but not yet paid for (because of fault). In this case, populate_db should not complete the transaction until it also has a payment authorization code.
Alternately, if you want to mark the seat's status as being_reserved, then there is no transaction, the status gets set to being_reserved and other clients can see it. In this model, populate_db would be responsible for detecting the fault (through exceptions possibly) and returning the seat status to available in another database update.
The error in your thinking is that the database can remain consistent regardless of the failure of any component. That requirement cannot be satisfied. You cannot both allow other clients to see being_reserved and suffer a failure of populate_db.
This trade-off is central to every reservation system ever written. And there are too many ways to regain consistency in the face of arbitrary failure to enumerate here.
I am using transaction.atomic as a context manager for transactions in django 1.6. There is a block of code which I want to be in a transaction which has a couple of network calls and some database writes. I am seeing very weird behaviour. Every once in while (maybe 1 in 20 times) I have noticed a partial rollback happening without any exception having been raised and the view executing without any errors. My application is hosted on heroku and we use heroku postgres v9.2.8. Pseudo code:
from django.db import transaction
def some_view(request):
try:
with transation.atomic():
network_call_1()
db_write_1.save(update_fields=['col4',])
db_write_2.save(update_fields=['col3',])
db_write_3.save(update_fields=['col1',])
network_call_2()
db_write_4.save(update_fields=['col6',])
db_write_5.bulk_create([object1, object2])
db_write_6.bulk_create([object1, object2])
except Exception, e:
logger.error(e)
return HttpResponse()
The behaviour that I have noticed is that without any exception having been raised, either db write 1-3 have rolled back and the rest gone through or db write 1 has been rolled back and rest have gone through and so on. I don't understand why this should be happening. First, if there is a rollback, shouldn't it be a complete rollback of the transaction? If there is a rollback shouldn't an exception also be raised so that I know a rollback has happened? Everytime this has happened, no exception has been raised and the code just continues executing and returns a successful HttpResponse.
Relevant settings:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mydb',
'USER': 'root',
'PASSWORD': 'root',
'HOST': 'localhost',
'PORT': '5432',
},
}
CONN_MAX_AGE = None
This bug has me baffled since days. Any clues will be of great help!
After hours of debugging, we have found the culprit.
When we start our application on gunicorn, it spawns workers. Every request coming to the same worker uses the same django DatabaseWrapper instance (postgres in our case) also referred to as a connection. If, in the middle of a transaction in one request, the worker were to receive another request, this request resets the state of the connection causing the transaction to behave in unexpected ways as documented in this bug: https://code.djangoproject.com/ticket/21239
Sometimes the transaction doesn't get committed and there is no exception raised to let you know that happened. Sometimes parts of it do get committed while the rest is lost and it looks like a partial rollback.
We thought that a connection is thread safe, but this bit of gunicorn patching magic here makes sure that's not the case: https://github.com/benoitc/gunicorn/blob/18.0/gunicorn/management/commands/run_gunicorn.py#L16
Still open to suggestions on how to sidestep this issue if possible at all.
EDIT: Don't use the run_gunicorn management command to start Django. It does some funky patching which causes DB connections to not be thread safe. The solution that worked for us is to just use "gunicorn myapp.wsgi:application -c gunicorn.conf". Django persistent DB connections don't work with the gevent worker type yet so avoid using that unless you want to run out of connections.
Not a Django expert, but I do know Postgres. I agree with your assessment that this sounds like very atypical behavior for a transaction: the rollback should be all-or-nothing, and there should be an exception. That being the case, can you be absolutely certain that this is a rollback-type situation? There are lots of other possible causes that could account for different data appearing in the database than you expected, and many of those scenarios would fit better with your observed occurrences that does a rollback.
You haven't provided any specifics as to your data, but what I imagine is, you're seeing something like "I set the value of col4 to 'foo', but after the commit, the old value 'bar' is still in the database." Is that correct?
If so, then other possible causes could be:
The code that is supposed to setting the 'foo' value somehow, on occasion, is actually setting either the existing 'bar' value, or a NULL value.
The code is setting the 'foo' value, but the there is a data access layer (aka DAL) with a 'dirty' flag that is not being set (e.g. if the object is in a disconnected state), so when the commit is done, the DAL doesn't see that as being a change it is supposed to write.
These are just a few examples to get you started. There are lots of other possible scenarios. Sometimes, the basic philosophy of debugging problems like this is similar to the problem of the DDT and pelicans: since the database is at the top of the food chain, you can often see problems there that-- while they appear to be database problems-- are actually caused somewhere else in your solution.
Good luck and hope that helps!
My 3 cents:
Exceptions
We're certain no exceptions have occurred. But are we? Your pseudo-code "handles" an exception by just logging. Make sure there are no exceptions "handled" elsewhere by logging or pass.
The partial rollback
We expect the the whole transaction to be rolled back, not just part. Since django 1.6 nested atomic transactions create a savepoint and rollbacks go back to the last savepoint. Make sure there are no nested transactions. Perhaps you have transaction middleware active check ATOMIC_REQUESTS and MIDDLEWARE_CLASSES. Maybe transactions are started in those network_call functions.
Reproducing
Since that network_call code may block. Try to replace them with mock calls, that timeout (maybe not in production). If that results in 100% (partial) rollbacks. It should make locating the problem of partial rollbacks easier.
Let me just make few remarks first.
It is not necessary to have an exception in this code and still have rollback.
Maybe there is some kind of timeout outside this code. Think if you killed python process in the middle of the second network call. This particular exception would not be logged.
I would also recommend adding
raise
at the end of except, it will log and re-raise the same exception. Cacthing all exceptions is rarely good.
Also, there might be a threading issue. Try importing threding and logging current thread id in your logger with the exception. You may find out that you actually have more than one thread, so one has to wait on another.
Generally, it is not a good idea to have some external calls in the middle of transaction.
Do both your calls before you start atomic transaction, so it can be as fast as possible.
Hope this helps.
Should the same UIManagedDocument be open on both of my devices, and I save (using the following code):
[self.documentDatabase.managedObjectContext performBlockAndWait:^{
STNoteLabelCell *cell = (STNoteLabelCell *)[self.noteTableView cellForRowAtIndexPath:indexPath];
[cell setNote:newNote animated:YES];
}];
I am told that the UIManagedDocuments documentState is changed to UIDocumentStateSavingError then I get this error:
CoreData: error: (1) I/O error for database at /var/mobile/Applications/some-long-id/Documents/Read.dox/StoreContent.nosync/persistentStore. SQLite error code:1, 'cannot rollback - no transaction is active'
2013-05-14 16:30:09.062 myApp[11711:4d23] -[_PFUbiquityRecordImportOperation main](312): CoreData: Ubiquity: Threw trying to get the knowledge vector from the store: <NSSQLCore: 0x1e9e2680> (URL: file://localhost/var/mobile/Applications/some-long-id/Documents/Read.dox/StoreContent.nosync/persistentStore)
Does anybody know why this error happens?
A couple of things...
I think the saving the same document on two different devices is a red herring, as it will actually be working on a local copy of your database on each device - only the transaction logs get uploaded to iCloud.
Secondly, I may be confused, but I don't see anything in the above code snippet that indicates you are performing a save (unless it is triggered by one of those calls, or autosave happens).
What that code snippet does seem to be doing is:
On your database document's child MOC thread, run the following block
of code
And that block of code is doing purely UI related stuff,
nothing to actually do with the database? The only thing that might
be going out to the DB is cellForRowAtIndexPath - and this usually
would be expected to only be doing read operations, not something
that needs to save.
The only other thing.... if the above code does trigger a save - you might have an issue with performing that as a performBlockAndWait. The UIManagedDocument save routines do stuff asynchronously - but they need the run loop to execute before they actually get a chance to run the async part... So by blocking before continuing you may actually be preventing the actual save from being executed or something.
I'm guessing wildly here, but have seen enough with saving managed documents to know to be really careful with which thread things are actually being called from, and that after a save request has been made, to allow the run loop to have a chance to actually do it. To be fair, this has only ever been an issue with calling saveToURL: repeatedly within one method, or in a loop, in which case all the async parts of the saves get queued up and executed at the end, usually to great comical effect.