I'm trying to implement a concurrent dictionary in Python - more specifically, the dictionary would be used by two threads, one that would use its clear and update methods, and the other which would access its values directly (i.e., by its __getitem__ method). The implementation is below:
from threading import Lock, current_thread
class ThreadSafeDict(dict):
def __init__(self, *args, **kwargs):
self._lock = Lock()
super(ThreadSafeDict, self).__init__(*args, **kwargs)
def clear(self, *args, **kwargs):
print("thread {} acquiring clear lock".format(current_thread().ident))
self._lock.acquire()
print("thread {} acquired clear lock".format(current_thread().ident))
super(ThreadSafeDict, self).clear(*args, **kwargs)
print("thread {} releasing clear lock".format(current_thread().ident))
self._lock.release()
print("thread {} released clear lock".format(current_thread().ident))
def __getitem__(self, *args, **kwargs):
print("thread {} acquiring getitem lock".format(current_thread().ident))
self._lock.acquire()
print("thread {} acquired getitem lock".format(current_thread().ident))
val = super(ThreadSafeDict, self).__getitem__(*args, **kwargs)
print("thread {} releasing getitem lock".format(current_thread().ident))
self._lock.release()
print("thread {} released getitem lock".format(current_thread().ident))
return val
def update(self, *args, **kwargs):
print("thread {} acquiring update lock".format(current_thread().ident))
self._lock.acquire()
print("thread {} acquiring update lock".format(current_thread().ident))
super(ThreadSafeDict, self).update(*args, **kwargs)
print("thread {} releasing update lock".format(current_thread().ident))
self._lock.release()
print("thread {} released update lock".format(current_thread().ident))
I'm testing the implementation with this script:
import threading
import random
import time
from threadsafedict import ThreadSafeDict
def reader(tsd):
while True:
try:
val = tsd[1]
except KeyError:
pass
interval = random.random() / 2
time.sleep(interval)
def writer(tsd):
while True:
tsd.clear()
interval = random.random() / 2
time.sleep(interval)
tsd.update({1: 'success'})
def main():
tsd = ThreadSafeDict()
w_worker = threading.Thread(target=writer, args=(tsd,))
r_worker = threading.Thread(target=reader, args=(tsd,))
w_worker.start()
r_worker.start()
w_worker.join()
r_worker.join()
if __name__ == '__main__':
main()
Sample output:
thread 140536098629376 acquiring clear lock
thread 140536098629376 acquired clear lock
thread 140536098629376 releasing clear lock
thread 140536098629376 released clear lock
thread 140536090236672 acquiring getitem lock
thread 140536090236672 acquired getitem lock
thread 140536090236672 acquiring getitem lock
thread 140536098629376 acquiring update lock
What am I doing wrong?
(I realize this concurrency would already be safe in CPython, but I'm trying to be implementation-agnostic)
The problem is that when the super().__getitem__() call in your ThreadSafeDict.__getitem()__ method fails to find an item with the given key, it raises KeyError which causes the remainder of your __getitem__() method to be skipped. That means that the lock will not be released, and any later calls to any of your methods will be blocked forever waiting to obtain a lock that will never be unlocked.
You can see that this is happening by the absence of 'releasing' and 'released' messages after the 'acquired getitem lock' message, which is immediately followed in that excerpt by another attempt to acquire the lock by the read thread. In your test code, your read thread will always hit this condition if it runs in the interval after a clear() has been performed but before an update() has been performed by the write thread.
To fix, catch the KeyError exception in your __getitem__() method, then release the lock, then re-raise the exception. The 'try/finally' construct provides a very straightforward way to do this; in fact this is the perfect situation for the use of 'finally'.
Or you could check that the desired key exists after acquiring the lock and before before calling super().__getitem__(), although that will hurt performance slightly if the usual expectation is that the key will exist.
BTW, it's not a great idea to have your ThreadSafeDict inherit from the dict class. This causes ThreadSafeDict to inherit all of the dict methods (for instance, __setitem__()) and any of those methods that you haven't overridden would bypass your lock if someone used them. If you aren't prepared to override all of those methods then it would be safer to have the underlying dict be an instance member of your class.
Related
I have a task that I want to run every minute so the data is as fresh as possible. However depending on the size of the update it can take longer than one minute to finish. Django-Q creates new task and queues it every minute though so there is some overlap synchronizing the same data pretty much. Is it possible to not schedule the task that is already in progress?
I ended up creating decorator that locks the task execution and on new task run just returns immediately if the lock is not available. Timeout is 1 hour (enough in my case).
from functools import wraps
from django.core.cache import cache
from redis.exceptions import LockNotOwnedError
def django_q_task_lock(func):
"""
Decorator for django q tasks for preventing overlap in parallel task runs
"""
#wraps(func)
def wrapped_task(*args, **kwargs):
task_lock = cache.lock(f"django_q-{func.__name__}", timeout=60 * 60)
if task_lock.acquire(blocking=False):
try:
func(*args, **kwargs)
except Exception as e:
try:
task_lock.release()
except LockNotOwnedError:
pass
raise e
try:
task_lock.release()
except LockNotOwnedError:
pass
return wrapped_task
#django_q_task_lock
def potentialy_long_running_task():
...
# task logic
...
I have a background process started, but I have no idea how to stop the process. I've read through the documentation but there doesn't seem to be any built in function to kill the task. How should I resolve this? I am specifically referring to the stopCollectingLiveData() section.
#socketio.on('collectLiveData')
def collectLiveData():
global thread
with thread_lock:
if thread is None:
thread = socketio.start_background_task(background_thread)
def background_thread():
"""Example of how to send server generated events to clients."""
count = 0
while True:
socketio.sleep(1)
count += 1
socketio.emit('my_response', {'count': count})
#socketio.on("stopCollectingLiveData")
def stopCollectingLiveData():
print('')
socketio.sleep()
You can stop the background thread with an event object, which you can pass as a parameter. As long as the event is set, the thread runs. When the event is cleared, the thread's execution is suspended.
# ...
from threading import Event
thread_event = Event()
# ...
#socketio.on('collectLiveData')
def collectLiveData():
global thread
with thread_lock:
if thread is None:
thread_event.set()
thread = socketio.start_background_task(background_thread, thread_event)
def background_thread(event):
"""Example of how to send server generated events to clients."""
global thread
count = 0
try:
while event.is_set():
socketio.sleep(1)
count += 1
socketio.emit('my_response', {'count': count})
finally:
event.clear()
thread = None
#socketio.on("stopCollectingLiveData")
def stopCollectingLiveData():
global thread
thread_event.clear()
with thread_lock:
if thread is not None:
thread.join()
thread = None
I'm using Channels v2.
I want to integrate long-polling into my project.
The only consumer I see in the documentation for http long polling is the AsyncHttpConsumer.
The code I need to run in my handle function is not asynchronous. It connects to another device on the network using a library that is not asynchronous. From what I understand, this will cause the event loop to block, which is bad.
Can I run my handler synchronously, in a thread somehow? There's a SyncConsumer, but that seems to have something to do with Web Sockets. It doesn't seem applicable to Long Polling.
Using AsyncHttpConsumer as a reference, I was able to write an almost exact duplicate of the class, but subclassing SyncConsumer instead of AsyncConsumer as AsyncHttpConsumer does.
After a bit of testing, I soon realized that since my code was all running in a single thread, until the handle() method finished running, which presumably runs until done, the disconnect() method wouldn't be triggered, so there was no way to interrupt a long running handle() method, even if the client disconnects.
The following new version runs handle() in a thread, and gives the user 2 ways to check if the client disconnected:
from channels.consumer import SyncConsumer
from channels.exceptions import StopConsumer
from threading import Thread, Event
# We can't pass self.client_disconnected to handle() as a reference if it's
# a regular bool. That means if we use a regular bool, and the variable
# changes in this thread, it won't change in the handle() method. Using a
# class fixes this.
# Technically, we could just pass the Event() object
# (self.client_disconnected) to the handle() method, but then the client
# needs to know to use .is_set() instead of just checking if it's True or
# False. This is easier for the client.
class RefBool:
def __init__(self):
self.val = Event()
def set(self):
self.val.set()
def __bool__(self):
return self.val.is_set()
def __repr__(self):
current_value = bool(self)
return f"RefBool({current_value})"
class SyncHttpConsumer(SyncConsumer):
"""
Sync HTTP consumer. Provides basic primitives for building synchronous
HTTP endpoints.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.handle_thread = None
self.client_disconnected = RefBool()
self.body = []
def send_headers(self, *, status=200, headers=None):
"""
Sets the HTTP response status and headers. Headers may be provided as
a list of tuples or as a dictionary.
Note that the ASGI spec requires that the protocol server only starts
sending the response to the client after ``self.send_body`` has been
called the first time.
"""
if headers is None:
headers = []
elif isinstance(headers, dict):
headers = list(headers.items())
self.send(
{"type": "http.response.start", "status": status, "headers": headers}
)
def send_body(self, body, *, more_body=False):
"""
Sends a response body to the client. The method expects a bytestring.
Set ``more_body=True`` if you want to send more body content later.
The default behavior closes the response, and further messages on
the channel will be ignored.
"""
assert isinstance(body, bytes), "Body is not bytes"
self.send(
{"type": "http.response.body", "body": body, "more_body": more_body}
)
def send_response(self, status, body, **kwargs):
"""
Sends a response to the client. This is a thin wrapper over
``self.send_headers`` and ``self.send_body``, and everything said
above applies here as well. This method may only be called once.
"""
self.send_headers(status=status, **kwargs)
self.send_body(body)
def handle(self, body):
"""
Receives the request body as a bytestring. Response may be composed
using the ``self.send*`` methods; the return value of this method is
thrown away.
"""
raise NotImplementedError(
"Subclasses of SyncHttpConsumer must provide a handle() method."
)
def disconnect(self):
"""
Overrideable place to run disconnect handling. Do not send anything
from here.
"""
pass
def http_request(self, message):
"""
Sync entrypoint - concatenates body fragments and hands off control
to ``self.handle`` when the body has been completely received.
"""
if "body" in message:
self.body.append(message["body"])
if not message.get("more_body"):
full_body = b"".join(self.body)
self.handle_thread = Thread(target=self.handle, args=(full_body, self.client_disconnected), daemon=True)
self.handle_thread.start()
def http_disconnect(self, message):
"""
Let the user do their cleanup and close the consumer.
"""
self.client_disconnected.set()
self.disconnect()
self.handle_thread.join()
raise StopConsumer()
The SyncHttpConsumer class is used very similarly to how you would use the AsyncHttpConsumer class - you subclass it, and define a handle() method. The only difference is that the handle() method takes an extra arg:
class MyClass(SyncHttpConsumer):
def handle(self, body, client_disconnected):
while not client_disconnected:
...
Or you could, just like with the AsyncHttpConsumer class, override the disconnect() method instead if you prefer.
I'm still not sure if this is the best way to do this, or why Django Channels doesn't include something like this in addition to AsyncHttpConsumer. If anyone knows, please let us know.
For debugging purposes, I need to do a task (send an email) when some Channel Workers stops for an error.
I don't find a closure method that I could edit to add mi task in the SyncConsumer or AsyncConsumer.
channels==2.2.0
channels-redis==2.4.0
For a completely generic approach:
You can try overriding the Consumers __call__ method
async def __call__(self, receive, send):
try:
await super().__call__(receive, send):
except:
.... do your stuff
raise
I have a task foobar:
#app.task(bind=True)
def foobar(self, owner, a, b):
if already_working(owner): # check if a foobar task is already running for owner.
register_myself(self.request.id, owner) # add myself in the DB.
return a + b
How can I mock the self.request.id attribute? I am already patching everything and calling directly the task rather than using .delay/.apply_async, but the value of self.request.id seems to be None (as I am doing real interactions with DB, it is making the test fail, etc…).
For the reference, I'm using Django as a framework, but I think that this problem is just the same, no matter the environment you're using.
Disclaimer: Well, I do not think it was documented somewhere and this answer might be implementation-dependent.
Celery wraps his tasks into celery.Task instances, I do not know if it swaps the celery.Task.run method by the user task function or whatever.
But, when you call a task directly, you call __call__ and it'll push a context which will contain the task ID, etc…
So the idea is to bypass __call__ and Celery usual workings, first:
we push a controlled task ID: foobar.push_request(id=1) for example.
then, we call the run method: foobar.run(*args, **kwargs).
Example:
#app.task(bind=True)
def foobar(self, name):
print(name)
return foobar.utils.polling(self.request.id)
#patch('foobar.utils.polling')
def test_foobar(mock_polling):
foobar.push_request(id=1)
mock_polling.return_value = "done"
assert foobar.run("test") == "done"
mock_polling.assert_called_once_with(1)
You can call the task synchronously using
task = foobar.s(<args>).apply()
This will assign a unique task ID, so the value will not be None and your code will run. Then you can check the results as part of your test.
There is probably a way to do this with patch, but I could not work out a way to assign a property. The most straightforward way is to just mock self.
tasks.py:
#app.task(name='my_task')
def my_task(self, *args, **kwargs):
*__do some thing__*
test_tasks.py:
from mock import Mock
def test_my_task():
self = Mock()
self.request.id = 'ci_test'
my_task(self)