Celery task fails midway while pulling data from a large database - django

I'm running a periodic task using celery in a django-rest application that pulls data from a large Postgres database with multiple tables, the task starts well and pulls some data for about 50 mins and then fails with this error
client_idle_timeout
server closed the connection unexpectedly, This probably means the server terminated abnormally before or while processing the request.
What could be the issue causing this and how can I go about to fix it?

It most likely means that your PostgreSQL has limit on how long transaction can take (idle in transaction), or how long session can be (session timeout).
This is probably happening because of a typical, incorrect way of dealing with databases (I've seen this done even by senior developers) - process creates a database session, and then starts doing some business logic that may take long time to finish, while DB data has been either partially updated or inserted. Code written in such way is doomed to fail because of timeouts enforced by PostgreSQL.

Related

Do I need to use celery.result.forget when using a database backend?

I've come across the following warning:
Backends use resources to store and transmit results. To ensure that resources are released, you must eventually call get() or forget() on EVERY AsyncResult instance returned after calling a task.
I am currently using the django-db backend and I am wondering about the consequences of not heeding this warning. What resources will not be "released" if I don't forget an AsyncResult? I'm not worried about cleaning up task results from my database. My primary concern is with the availability of workers being affected.
I've actually never seen that warning. As long as you're running celery beat, you'll be fine. Celery has a default periodic task that it sets up for you scheduled to run at 4:00 AM. That task deletes any expired results in your database if you are using a db-based backend like postgres or mysql.
Celery seems to have a setting for this which is result_expires. The documentation explains it all:
result_expires
Default: Expire after 1 day.
Time (in seconds, or a timedelta object) for when after stored task tombstones will be deleted.
But as #2ps mentioned, celery-beat must be running for database backends which as documented tells that:
A built-in periodic task will delete the results after this time (celery.backend_cleanup), assuming that celery beat is enabled. The task runs daily at 4am.
For other types of backends e.g. AMQP, it seems not necessary as documented:
Note
For the moment this only works with the AMQP, database, cache, Couchbase, and Redis backends.
When using the database backend, celery beat must be running for the results to be expired.

Returning the result of celery task to the client in Django template

So I'm trying to accomplish the following. User browses webpage and at the sime time there is a task running in the background. When the task completes it should return args where one of args is flag: True in order to trigger a javascript and javascript shows a modal form.
I tested it before without async tasks and it works, but now with celery it just stores results in database. I did some research on tornado-celery and related stuff but some of components like tornado-redis is not mantained anymore so it would not be vise in my opinion to use that.
So what are my options, thanks?
If I understand you correctly, then you want to communicate something from the server side back to the client. You generally have three options for that:
1) Make a long pending request to the server - kinda bad. Jumping over the details, it will bog down your web server if not configured to handle that, it will make your site score low on performance tests and if the request fails, everything fails.
2) Poll the server with numerous requests with a time interval (0.2 s, something like that) - better. It will increase the traffic, but the requests will be tiny and will not interfere with the site's performance very much. If you instate a long interval to not load the server with pointless requests, then the users will see the data with a bit of a delay. On the upside this will not fail (if written correctly) even if the connection is interrupted.
3) Websockets where the server can just hit the client with any message whenever needed - nice, but takes some time to get used to. If you want to try, you can use django-channels which is a nice library for Django websockets.
If I did not understand you correctly and this is not the problem at hand and you are figuring how to get data back from a Celery task to Django, then you can store the Celery task ID-s and use the ID-s to first check, if the task is completed and then query the data from Celery.

How to survive a database outage?

I have a web service that is made using spring, hibernate and c3p0. I also have a service wide cache(which has the results of requests ever made to the service) which can be used to return results when the service isn't able to return(due to whatever reason). The cache might return stale results when the database is out but that's ok.
I recently faced a database outage and my service came to a crashing halt.
I want the clients of my service to survive database outages happening ever again in future.
For that, I need my service to:
Handle new incoming requests like this: quickly say that the database is down and throw some exception(fast-fail).
Requests already being processed: Don't last longer than x seconds. How do I make the thread handling the request be interrupted somehow.
Cache the whole database in memory for read-only purposes(Is this insane?).
There are some observations that I made:
If there is one or more connection(s) with status ESTABLISHED, then an attempt to checkout a new connection is not made. Seems like any one connection with status ESTABLISED is handed over to the thread receiving the request. Now, this thread just hangs till the time the database comes back up.
I would want to make this request fast-fail by knowing before handling over a connection to a thread whether db is up or not. If no, the service should throw exception instead of hanging up.
If there's no connection with status ESTABLISHED, then the request fails in 10 secs with the exception that "Could not checkout a new connection". This is due to my checkout timeout being set for 10s.
If the service was processing some request, now the db goes and then the service makes a call to db, the thread making the call to db gets stuck forever. It resumes execution only after the db comes back.
I would like to interrupt the thread after say x seconds whether or not it was able to complete the request.
Are there ways to accomplish what I seek?
Thanks in advance.

Caching in Djangos object model

I'm running a system with a few workers that's taking jobs from a message queue, all using Djangos ORM.
In one case I'm actually passing a message along from one worker to another in another queue.
It works like this:
Worker1 in queue1 creates an object (MySQL INSERT) and pushes a message to queue2
Worker2 accepts the new message in queue2 and retrieves the object (MySQL SELECT), using Djangos objects.get(pk=object_id)
This works for the first message. But in the second message worker 2 always fails on that it can't find object with id object_id (with Django exception DoesNotExist).
This works seamlessly in my local setup with Django 1.2.3 and MySQL 5.1.66, the problem occurs only in my test environment which runs Django 1.3.1 and MySQL 5.5.29.
If I restart worker2 every time before worker1 pushes a message, it works fine. This makes me believe there's some kind of caching going on.
Is there any caching involved in Django's objects.get() that differs between these versions? If that's the case, can I clear it in some way?
The issue is likely related to the use of MySQL transactions. On the sender's site, the transaction must be committed to the database before notifying the receiver of an item to read. On the receiver's side, the transaction level used for a session must be set such that the new data becomes visible in the session after the sender's commit.
By default, MySQL uses the REPEATABLE READ isolation level. This poses problems where there are more than one process reading/writing to the database. One possible solution is to set the isolation level in the Django settings.py file using a DATABASES option like the following:
'OPTIONS': {'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'},
Note however that changing the transaction isolation level may have other side effects, especially when using statement based replication.
The following links provide more useful information:
How do I force Django to ignore any caches and reload data?
Django ticket#13906

Django/Postgres performance worsening after repeatedly processing the same query

I am running Django on Apache. I have several client computers which should call urllib2.urlopen() and send over some data which my server will process and immediately send back a reply. However, when I am testing this I found a very tricky issue. I have one client repeatedly send the same data to be processed. The first time, it takes around ~20 seconds, second time, it takes about 40 seconds, third time I get a 504 (gateway timeout) error. If I try to send data some more 504 errors randomly pop up. I am pretty sure this is an issue with Postgres as the function that processes the information makes many database calls, however, I do not know why the performance of Postgres would decline so much. I have tried several database optimization tricks, including this one: (http://stackoverflow.com/questions/1125504/django-persistent-database-connection), to no avail.
Thanks in advance.
Edit: The requests are not coming concurrently. They are coming in back to back and each query involves a lot of SELECTs and JOINs, and there are a few INSERTs and UPDATEs as well. The apache error logs show that it is just a simple timeout, where the function to process the client posted data takes over 90 seconds.
If it's really Postgres, then you should turn on the logging of slow statements in the Postgres configuration to find out which statement exactly is taking so much time.
This can be done by setting the configuration property log_min_duration.
Details are in the manual:
http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
You say the function makes "many database calls" so I'd start with a very low number, or even 0 to log the duration of all statements, then you might be able to identify the slow ones.
It could also be a locking issued. Maybe the first call does not end its transaction properly and subsequent calls run into a timeout when waiting for a resource.
You can verify this by checking the system view pg_locks after the first call.
Have you checked the Apache error_logs? Have you set django DEBUG = True or ADMINS = ('email#addr.com',) so you can get a detailed error report about what the actual cause of the issue is? If so, how about pasting some information here.
Why are you certain that it's postgres? Have you done diagnostics to come to that conclusion? If so, please let us know.
Are you running apache with mod_wsgi? How many processes and threads have you allocated to your django application?
Also, 20 seconds to process the first transaction is a huge amount of time. Perhaps you could show us the view code that is causing the time out. We may be able to help there.
I sincerely doubt that it's going to be postgres alone that is causing the issue. It probably has something to do with application code, or server configuration.