I have a Django based site that has several background processes that are executed in Celery workers. I have one particular task that can run for a few seconds with several read/writes to the database that are subject to a race condition if a second task tries to access the same rows.
I'm trying to prevent this by ensuring the task is only ever running on a single worker at a time but I'm running into issues getting this to work correctly. I've used this Celery Task Cookbook Recipe as inspiration, trying to make my own version that works for my scenario of ensuring that this specific task is only running on one worker at a time, but it still seems to be possible to encounter situations where it's executed across more than one worker.
So far, in tasks.py I have:
class LockedTaskInProgress(Exception):
"""The locked task is already in progress"""
silent_variable_failure = True
#shared_task(autoretry_for=[LockedTaskInProgress], default_retry_delay=30)
def create_or_update_payments(things=None):
"""
This takes a list of `things` that we want to process payments on. It will
read the thing's status, then apply calculations to make one or more payments
for various users that are owed money for the thing.
`things` - The list of things we need to process payments on.
"""
lock = cache.get('create_or_update_payments') # Using Redis as our cache backend
if not lock:
logger.debug('Starting create/update payments processing. Locking task.')
cache.set('create_or_update_payments', 'LOCKED')
real_create_or_update_payments(things) # Long running function w/ lots of DB read/writes
cache.delete('create_or_update_payments')
logger.debug('Completed create/update payments processing. Lock removed.')
else:
logger.debug('Unable to process create/update payments at this time. Lock detected.')
raise LockedTaskInProgress
The above seems to almost work but there still looks to be a possible race condition between the cache.get and cache.set that has shown up in my testing.
I'd love to get suggestions on how to improve this to make it more robust.
Think I've found a way of doing this, inspired by an older version of the Celery Task Cookbook Recipe I was using earlier.
Here's my implementation:
class LockedTaskInProgress(Exception):
"""The locked task is already in progress"""
silent_variable_failure = True
#shared_task(autoretry_for=[LockedTaskInProgress], default_retry_delay=30)
def create_or_update_payments(things=None):
"""
This takes a list of `things` that we want to process payments on. It will
read the thing's status, then apply calculations to make one or more payments
for various users that are owed money for the thing.
`things` - The list of things we need to process payments on.
"""
LOCK_EXPIRE = 60 * 5 # 5 Mins
lock_id = 'create_or_update_payments'
acquire_lock = lambda: cache.add(lock_id, 'LOCKED', LOCK_EXPIRE)
release_lock = lambda: cache.delete(lock_id)
if acquire_lock():
try:
logger.debug('Starting create/update payments processing. Locking task.')
real_create_or_update_payments(things) # Long running function w/ lots of DB read/writes
finally:
release_lock()
logger.debug('Completed create/update payments processing. Lock removed.')
else:
logger.debug('Unable to process create/update payments at this time. Lock detected.')
raise LockedTaskInProgress
It's very possible that there's a better way of doing this but this seems to work in my tests.
I'm encountering an issue where I have a function that is intended to require serialized access dependent on some circumstances. This seemed like a good case for using advisory locks. However, under fairly heavy load, I'm finding that the serialized access isn't occurring and I'm seeing concurrent access to the function.
The intention of this function is to provide "inventory control" for a event. Meaning, it is intended to limit concurrent ticket purchases for a given event such that the event is not oversold. These are the only advisory locks used within the application/database.
I'm finding that occasionally there are more tickets in an event than the eventTicketMax value. This doesn't seem like it should be possible because of the advisory locks. When testing with low volume (or manually introduced delays such as pg_sleep after acquiring the lock), things work as expected.
CREATE OR REPLACE FUNCTION createTicket(
userId int,
eventId int,
eventTicketMax int
) RETURNS integer AS $$
DECLARE insertedId int;
DECLARE numTickets int;
BEGIN
-- first get the event lock
PERFORM pg_advisory_lock(eventId);
-- make sure we aren't over ticket max
numTickets := (SELECT count(*) FROM api_ticket
WHERE event_id = eventId and status <> 'x');
IF numTickets >= eventTicketMax THEN
-- raise an exception if this puts us over the max
-- and bail
PERFORM pg_advisory_unlock(eventId);
RAISE EXCEPTION 'Maximum entries number for this event has been reached.';
END IF;
-- create the ticket
INSERT INTO api_ticket (
user_id,
event_id,
created_ts
)
VALUES (
userId,
eventId,
now()
)
RETURNING id INTO insertedId;
-- update the ticket count
UPDATE api_event SET ticket_count = numTickets + 1 WHERE id = eventId;
-- release the event lock
PERFORM pg_advisory_unlock(eventId);
RETURN insertedId;
END;
$$ LANGUAGE plpgsql;
Here's my environment setup:
Django 1.8.1 (django.db.backends.postgresql_psycopg2 w/ CONN_MAX_AGE 300)
PGBouncer 1.7.2 (session mode)
Postgres 9.3.10 on Amazon RDS
Additional variables which I tried tuning:
setting CONN_MAX_AGE to 0
Removing pgbouncer and connecting directly to DB
In my testing, I have noticed that, in cases where an event was oversold, the tickets were purchased from different webservers so I don't think there is any funny business about a shared session but I can't say for sure.
As soon as PERFORM pg_advisory_unlock(eventId)is executed, another session can grab that lock, but as the INSERT of session #1 is not yet commited, it will not be counted in the COUNT(*)of session #2, resulting in the over-booking.
If keeping the advisory lock strategy, you must use transaction-level advisory locks (pg_advisory_xact_lock), as opposed to session-level. Those locks are automatically released at COMMIT time.
I'm writing a django app to make polls which uses celery to put under control the voting system. Right now, I have two queues, default and polls, the first one with concurrency set to 8 and the second one set to 1.
$ celery multi start -A myproject.celery default polls -Q:default default -Q:polls polls -c:default 8 -c:polls 1
Celery routes:
CELERY_ROUTES = {
'polls.tasks.option_add_vote': {
'queue': 'polls',
},
'polls.tasks.option_subtract_vote': {
'queue': 'polls',
}
}
Task:
#app.task
def option_add_vote(pk):
"""
Updates given option id and its poll increasing vote number by 1.
"""
option = Option.objects.get(pk=pk)
try:
with transaction.atomic():
option.vote_quantity += 1
option.save()
option.poll.total_votes += 1
option.poll.save()
except IntegrityError as exc:
raise self.retry(exc=exc)
The option_add_vote method (task) updates the poll-object vote-number value adding 1 to the previous value. So, to avoid concurrency problems, I set the poll queue concurrency to 1. This allow the system to handle thousand of vote requests to be completed successfully.
The problem will be, as I can imagine, a bottle-neck when the system grows up.
So, I was thinking about some kind of dynamic queues where all vote requests to any options of a certain poll where routered to a custom queue. I think this will make the system more reliable and fast.
What do you think? How can I make it?
EDIT1:
I got a new idea thanks to Paul and Plahcinski. I'm storing the votes as objects in their own model (a user-options relationship). When someone votes an option it creates an object from this model, allowing me to count how many votes an option has. This free the system from the voting-concurrency problem, so it could be executed in parallel.
I'm thinking about using CELERYBEAT_SCHEDULE to cron a task that updates poll options based on the result of Vote.objects.get(pk=pk).count(). Maybe I could execute it every hour or do partial updates for those options that are getting new votes...
But, how do I give to the clients updated options in real time?
As Plahcinski says, I can have a cached value for my options in Redis (or any other mem-cached system?) and use it to temporally store this values, giving to any new request the cached value.
How can I mix this with my standar values in django models? Anyone could give me some code references or hints?
Am I in the good way or did I make mistakes?
What I would do is remove your incrementation for the database and move to redis and use the database model as your cached value. Have a celery beat that updates recently incremented redis keys to your database
http://redis.io/commands/INCR
What about just having a simple model that stores vote -1/+1 integers then a celery task that reconciles those with the FK object for atomic transactions and updates?
I have requirements:
I have few heavy-resource-consume task - exporting different reports that require big complex queries, sub queries
There are lot users.
I have built project in django, and queue task using celery
I want to restrict user so that they can request 10 report per minute. The idea is they can put hundreds of request 10 minute, but I want celery to execute 10 task for a user. So that every user gets their turn.
Is there any way so that celery can do this?
Thanks
Celery has a setting to control the RATE_LIMIT (http://celery.readthedocs.org/en/latest/userguide/tasks.html#Task.rate_limit), it means, the number of task that could be running in a time frame.
You could set this to '100/m' (hundred per second) maning your system allows 100 tasks per seconds, its important to notice, that setting is not per user neither task, its per time frame.
Have you thought about this approach instead of limiting per user?
In order to have a 'rate_limit' per task and user pair you will have to do it. I think (not sure) you could use a TaskRouter or a signal based on your needs.
TaskRouters (http://celery.readthedocs.org/en/latest/userguide/routing.html#routers) allow to route tasks to a specify queue aplying some logic.
Signals (http://celery.readthedocs.org/en/latest/userguide/signals.html) allow to execute code in few well-defined points of the task's scheduling cycle.
An example of Router's logic could be:
if task == 'A':
user_id = args[0] # in this task the user_id is the first arg
qty = get_task_qty('A', user_id)
if qty > LIMIT_FOR_A:
return
elif task == 'B':
user_id = args[2] # in this task the user_id is the seconds arg
qty = get_task_qty('B', user_id)
if qty > LIMIT_FOR_B:
return
return {'queue': 'default'}
With the approach above, every time a task starts you should increment by one in some place (for example Redis) the pair user_id/task_type and
every time a task finishes you should decrement that value in the same place.
Its seems kind of complex, hard to maintain and with few failure points for me.
Other approach, which i think could fit, is to implement some kind of 'Distributed Semaphore' (similar to distributed lock) per user and task, so in each task which needs to limit the number of task running you could use it.
The idea is, every time a task which should have 'concurrency control' starts it have to check if there is some resource available if not just return.
You could imagine this idea as below:
#shared_task
def my_task_A(user_id, arg1, arg2):
resource_key = 'my_task_A_{}'.format(user_id)
available = SemaphoreManager.is_available_resource(resource_key)
if not available:
# no resources then abort
return
try:
# the resourse could be acquired just before us for other
if SemaphoreManager.acquire(resource_key):
#execute your code
finally:
SemaphoreManager.release(resource_key)
Its hard to say which approach you SHOULD take because that depends on your application.
Hope it helps you!
Good luck!
I'm trying to port some code to Python that uses sqlite databases, and I'm trying to get transactions to work, and I'm getting really confused. I'm really confused by this; I've used sqlite a lot in other languages, because it's great, but I simply cannot work out what's wrong here.
Here is the schema for my test database (to be fed into the sqlite3 command line tool).
BEGIN TRANSACTION;
CREATE TABLE test (i integer);
INSERT INTO "test" VALUES(99);
COMMIT;
Here is a test program.
import sqlite3
sql = sqlite3.connect("test.db")
with sql:
c = sql.cursor()
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
You may notice the deliberate mistake in it. This causes the SQL script to fail on the second line, after the update has been executed.
According to the docs, the with sql statement is supposed to set up an implicit transaction around the contents, which is only committed if the block succeeds. However, when I run it, I get the expected SQL error... but the value of i is set from 99 to 1. I'm expecting it to remain at 99, because that first update should be rolled back.
Here is another test program, which explicitly calls commit() and rollback().
import sqlite3
sql = sqlite3.connect("test.db")
try:
c = sql.cursor()
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
sql.commit()
except sql.Error:
print("failed!")
sql.rollback()
This behaves in precisely the same way --- i gets changed from 99 to 1.
Now I'm calling BEGIN and COMMIT explicitly:
import sqlite3
sql = sqlite3.connect("test.db")
try:
c = sql.cursor()
c.execute("begin")
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
c.execute("commit")
except sql.Error:
print("failed!")
c.execute("rollback")
This fails too, but in a different way. I get this:
sqlite3.OperationalError: cannot rollback - no transaction is active
However, if I replace the calls to c.execute() to c.executescript(), then it works (i remains at 99)!
(I should also add that if I put the begin and commit inside the inner call to executescript then it behaves correctly in all cases, but unfortunately I can't use that approach in my application. In addition, changing sql.isolation_level appears to make no difference to the behaviour.)
Can someone explain to me what's happening here? I need to understand this; if I can't trust the transactions in the database, I can't make my application work...
Python 2.7, python-sqlite3 2.6.0, sqlite3 3.7.13, Debian.
For anyone who'd like to work with the sqlite3 lib regardless of its shortcomings, I found that you can keep some control of transactions if you do these two things:
set Connection.isolation_level = None (as per the docs, this means autocommit mode)
avoid using executescript at all, because according to the docs it "issues a COMMIT statement first" - ie, trouble. Indeed I found it interferes with any manually set transactions
So then, the following adaptation of your test works for me:
import sqlite3
sql = sqlite3.connect("/tmp/test.db")
sql.isolation_level = None
c = sql.cursor()
c.execute("begin")
try:
c.execute("update test set i = 1")
c.execute("fnord")
c.execute("update test set i = 0")
c.execute("commit")
except sql.Error:
print("failed!")
c.execute("rollback")
Per the docs,
Connection objects can be used as context managers that automatically
commit or rollback transactions. In the event of an exception, the
transaction is rolled back; otherwise, the transaction is committed:
Therefore, if you let Python exit the with-statement when an exception occurs, the transaction will be rolled back.
import sqlite3
filename = '/tmp/test.db'
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
sqls = [
'DROP TABLE IF EXISTS test',
'CREATE TABLE test (i integer)',
'INSERT INTO "test" VALUES(99)',]
for sql in sqls:
cursor.execute(sql)
try:
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
sqls = [
'update test set i = 1',
'fnord', # <-- trigger error
'update test set i = 0',]
for sql in sqls:
cursor.execute(sql)
except sqlite3.OperationalError as err:
print(err)
# near "fnord": syntax error
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
cursor.execute('SELECT * FROM test')
for row in cursor:
print(row)
# (99,)
yields
(99,)
as expected.
Python's DB API tries to be smart, and begins and commits transactions automatically.
I would recommend to use a DB driver that does not use the Python DB API, like apsw.
Here's what I think is happening based on my reading of Python's sqlite3 bindings as well as official Sqlite3 docs. The short answer is that if you want a proper transaction, you should stick to this idiom:
with connection:
db.execute("BEGIN")
# do other things, but do NOT use 'executescript'
Contrary to my intuition, with connection does not call BEGIN upon entering the scope. In fact it doesn't do anything at all in __enter__. It only has an effect when you __exit__ the scope, choosing either COMMIT or ROLLBACK depending on whether the scope is exiting normally or with an exception.
Therefore, the right thing to do is to always explicitly mark the beginning of your transactional with connection blocks using BEGIN. This renders isolation_level irrelevant within the block, because thankfully it only has an effect while autocommit mode is enabled, and autocommit mode is always suppressed within transaction blocks.
Another quirk is executescript, which always issues a COMMIT before running your script. This can easily mess up the transactional with connection block, so your choice is to either
use exactly one executescript within the with block and nothing else, or
avoid executescript entirely; you can call execute as many times as you want, subject to the one-statement-per-execute limitation.
Normal .execute()'s work as expected with the comfortable default auto-commit mode and the with conn: ... context manager doing auto-commit OR rollback - except for protected read-modify-write transactions, which are explained at the end of this answer.
sqlite3 module's non-standard conn_or_cursor.executescript() doesn't take part in the (default) auto-commit mode (and so doesn't work normally with the with conn: ... context manager) but forwards the script rather raw. Therefor it just commits a potentially pending auto-commit transactions at start, before "going raw".
This also means that without a "BEGIN" inside the script executescript() works without a transaction, and thus no rollback option upon error or otherwise.
So with executescript() we better use a explicit BEGIN (just as your inital schema creation script did for the "raw" mode sqlite command line tool). And this interaction shows step by step whats going on:
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>> conn.executescript("BEGIN; UPDATE TEST SET i = 1; FNORD; COMMIT""")
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
OperationalError: near "FNORD": syntax error
>>> list(conn.execute('SELECT * FROM test'))
[(1,)]
>>> conn.rollback()
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>>
The script didn't reach the "COMMIT". And thus we could the view the current intermediate state and decide for rollback (or commit nevertheless)
Thus a working try-except-rollback via excecutescript() looks like this:
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>> try: conn.executescript("BEGIN; UPDATE TEST SET i = 1; FNORD; COMMIT""")
... except Exception as ev:
... print("Error in executescript (%s). Rolling back" % ev)
... conn.executescript('ROLLBACK')
...
Error in executescript (near "FNORD": syntax error). Rolling back
<sqlite3.Cursor object at 0x011F56E0>
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>>
(Note the rollback via script here, because no .execute() took over commit control)
And here a note on the auto-commit mode in combination with the more difficult issue of a protected read-modify-write transaction - which made #Jeremie say "Out of all the many, many things written about transactions in sqlite/python, this is the only thing that let me do what I want (have an exclusive read lock on the database)." in a comment on an example which included a c.execute("begin"). Though sqlite3 normally does not make a long blocking exclusive read lock except for the duration of the actual write-back, but more clever 5-stage locks to achieve enough protection against overlapping changes.
The with conn: auto-commit context does not already put or trigger a lock strong enough for protected read-modify-write in the 5-stage locking scheme of sqlite3. Such lock is made implicitely only when the first data-modifying command is issued - thus too late.
Only an explicit BEGIN (DEFERRED) (TRANSACTION) triggers the wanted behavior:
The first read operation against a database creates a SHARED lock and
the first write operation creates a RESERVED lock.
So a protected read-modify-write transaction which uses the programming language in general way (and not a special atomic SQL UPDATE clause) looks like this:
with conn:
conn.execute('BEGIN TRANSACTION') # crucial !
v = conn.execute('SELECT * FROM test').fetchone()[0]
v = v + 1
time.sleep(3) # no read lock in effect, but only one concurrent modify succeeds
conn.execute('UPDATE test SET i=?', (v,))
Upon failure such read-modify-write transaction could be retried a couple of times.
You can use the connection as a context manager. It will then automatically rollback the transactions in the event of an exception or commit them otherwise.
try:
with con:
con.execute("insert into person(firstname) values (?)", ("Joe",))
except sqlite3.IntegrityError:
print("couldn't add Joe twice")
See https://docs.python.org/3/library/sqlite3.html#using-the-connection-as-a-context-manager
This is a bit old thread but if it helps I've found that doing a rollback on the connection object does the trick.