Pyqt5 concurrent file operations with concurrent.futures module - concurrency

I'm currently running on one python Gui development based on Pyqt5. There is quite a lot of file system related operations such as copy, move and delete. I tried to enhance the performance of these operations by utilizing concurrent.futures module and experimented on both Threadpoolexecutor and Processpoolexecutor. However, neither of them could give me a unblocking gui which would be able to go on with user interaction after I submitted the file operations to the pool. These submitted operations are simply static methods of native python classes. I also added callbacks through the Future objects returned by submissions, which are the instance methods of one qt class and which did get called when the corresponding submitted job had completed.
Just unable to figure out a way to solve the problem of freezed ui(even just for seconds' duration) Could anybody help?
from concurrent.futures import ProcessPoolExecutor
class ArchiveResManager:
def __init__(self, resource_dir):
self._path_reference_dir = resource_dir
def get_resource_list(self):
return tuple(
de.path for de in os.scandir(self._path_reference_dir) if de.is_file()
)
def add_callback_resource_added(fn):
self._callback_on_resource_added = fn
def add_resources(self, resources: iter):
src_paths = tuple(p for p in resources)
dest_paths = tuple(self._get_dest_path(p) for p in resources)
with ProcessPoolExecutor(2) as executor:
exe.submit(
ArchiveResManager._do_resource_operations,
src_paths,
dest_paths
).add_done_callback(self._on_resource_added)
#staticmethod
def _do_resource_operations(src_paths, dest_paths):
added_items = list()
for src_path, dest_path in zip(src_paths, dest_paths):
shutil.copyfile(
src_path, dest_path
)
added_items.append(dest_path)
return added_items
def _on_resource_added(self, future):
if future.done():
if self._callback_on_resource_added:
self._callback_on_resource_added(future.result())
# This model class is bound to a instance of QListView in the QDialog of my project
class ArchiveModel(QAbstractListModel):
def __init__(self, archive_dir, parent):
super().__init__(parent)
self._archive_manager = ArchiveResManager(archive_dir)
self._archive_manager.add_callback_resource_added(self._on_archive_operations_completed)
self._archive_items = self._archive_manager.get_resource_list()
def _on_archive_operations_completed(self, result_list):
last_row = self.rowCount() - 1
self.beginInsertRows(QModelIndex(), last_row, last_row + len(result_list))
self._archive_items.extend(result_list)
self.endInsetRows()

Related

How to turn a Django Rest Framework API View into an async one?

I am trying to build a REST API that will manage some machine learning classification tasks. I have written an API view, which when hit, will trigger the start of a classification task (such as: training an SVM classifier with the data the user provided previously). However, this is a long running task, so I would ideally not have the user wait once they have made a request to this view. Instead, I would like to start this task in the background and give them a response immediately. They can later view the results of the classification in a separate view (haven't implemented that yet.)
I am using ASGI_APPLICATION = 'mlxplorebackend.asgi.application' in settings.py.
Here's my API view in views.py
import asyncio
from concurrent.futures import ProcessPoolExecutor
from django import setup as SetupDjango
# ... other imports
loop = asyncio.get_event_loop()
def DummyClassification():
result = sum(i * i for i in range(10 ** 7))
print(result)
return result
# ... other API views
class TaskExecuteView(APIView):
"""
Once an API call is made to this view, the classification algorithm will start being processed.
Depends on:
1. Parser for the classification algorithm type and parameters
2. Classification algorithm implementation
"""
def get(self, request, taskId, *args, **kwargs):
try:
task = TaskModel.objects.get(taskId = taskId)
except TaskModel.DoesNotExist:
raise Http404
else:
# this is basically the classification task for now
# need to turn this to an async view
with ProcessPoolExecutor(initializer = SetupDjango) as pool:
loop.run_in_executor(pool, DummyClassification)
return Response({ "message": "The task with id: {} has been started".format(task.taskId) }, status = status.HTTP_200_OK)
The problem I am facing is the following:
When I do not use with ProcessPoolExecutor(initializer = SetupDjango) as pool: i.e. without the initializer, I get django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. (full traceback at: https://paste.ubuntu.com/p/ctjmFNYMXW/)
When I do use the initializer, the view no longer remains async, it gets blocked. The response returns after the task is completed, which is about 5 seconds on my machine. I do realize I am not really making use of asyncio.sleep() inside my DummyClassification() function, but I can't figure out the way to do so.
I am guessing this is not the way to do it, therefore any suggestions would be appreciated. I would like to avoid celery if I can, since that seems a tad bit too complicated for me.
Edit:
If I get rid of ProcessPoolExecutor() and simply do loop.run_in_executor(None, DummyClassification), it works as expected, but then only one worker thread is working on the task, which doesn't seem remotely ideal for a classification task.
This was a ride. I at first went through the pain of setting up celery only to find out that the original problem of the classification task using one CPU core remains. Then I switched to django-rq with redis and it is currently working as expected.
from .tasks import Pipeline
class TaskExecuteView(APIView):
"""
Once an API call is made to this view, the classification algorithm will start being processed.
Depends on:
1. Parser for the classification algorithm type
2. Classification algorithm implementation
"""
def get(self, request, taskId, *args, **kwargs):
try:
task = TaskModel.objects.get(taskId = taskId)
except TaskModel.DoesNotExist:
raise Http404
else:
Pipeline.delay(taskId) # this is async now ✔
# mark this as an in-progress task
TaskModel.objects.filter(taskId = taskId).update(inProgress = True)
return Response({ "message": "The task with id: {}, title: {} has been started".format(task.taskId, task.taskTitle) }, status = status.HTTP_200_OK)
tasks.py
from django_rq import job
#job('default', timeout=3600)
def Pipeline(taskId):
# classification task

how to get ticking timer with dynamic label?

What im trying to do is that whenever cursor is on label it must show the time elapsed since when it is created it does well by subtracting (def on_enter(i)) the value but i want it to be ticking while cursor is still on label.
I tried using after function as newbie i do not understand it well to use on dynamic labels.
any help will be appreciated thx
code:
from Tkinter import *
import datetime
date = datetime.datetime
now = date.now()
master=Tk()
list_label=[]
k=[]
time_var=[]
result=[]
names=[]
def delete(i):
k[i]=max(k)+1
time_var[i]='<deleted>'
list_label[i].pack_forget()
def create():#new func
i=k.index(max(k))
for j in range(i+1,len(k)):
if k[j]==0:
list_label[j].pack_forget()
list_label[i].pack(anchor='w')
time_var[i]=time_now()
for j in range(i+1,len(k)):
if k[j]==0:
list_label[j].pack(anchor='w')
k[i]=0
###########################
def on_enter(i):
list_label[i].configure(text=time_now()-time_var[i])
def on_leave(i):
list_label[i].configure(text=names[i])
def time_now():
now = date.now()
return date(now.year,now.month,now.day,now.hour,now.minute,now.second)
############################
for i in range(11):
lb=Label(text=str(i),anchor=W)
list_label.append(lb)
lb.pack(anchor='w')
lb.bind("<Button-3>",lambda event,i=i:delete(i))
k.append(0)
names.append(str(i))
lb.bind("<Enter>",lambda event,i=i: on_enter(i))
lb.bind("<Leave>",lambda event,i=i: on_leave(i))
time_var.append(time_now())
master.bind("<Control-Key-z>",lambda event: create())
mainloop()
You would use after like this:
###########################
def on_enter(i):
list_label[i].configure(text=time_now()-time_var[i])
list_label[i].timer = list_label[i].after(1000, on_enter, i)
def on_leave(i):
list_label[i].configure(text=names[i])
list_label[i].after_cancel(list_label[i].timer)
However, your approach here is all wrong. You currently have some functions and a list of data. What you should do is make a single object that contains the functions and data together and make a list of those. That way you can write your code for a single Label and just duplicate that. It makes your code a lot simpler partly because you no longer need to keep track of "i". Like this:
import Tkinter as tk
from datetime import datetime
def time_now():
now = datetime.now()
return datetime(now.year,now.month,now.day,now.hour,now.minute,now.second)
class Kiran(tk.Label):
"""A new type of Label that shows the time since creation when the mouse hovers"""
hidden = []
def __init__(self, master=None, **kwargs):
tk.Label.__init__(self, master, **kwargs)
self.name = self['text']
self.time_var = time_now()
self.bind("<Enter>", self.on_enter)
self.bind("<Leave>", self.on_leave)
self.bind("<Button-3>", self.hide)
def on_enter(self, event=None):
self.configure(text=time_now()-self.time_var)
self.timer = self.after(1000, self.on_enter)
def on_leave(self, event=None):
self.after_cancel(self.timer) # cancel the timer
self.configure(text=self.name)
def hide(self, event=None):
self.pack_forget()
self.hidden.append(self) # add this instance to the list of hidden instances
def show(self):
self.time_var = time_now() # reset time
self.pack(anchor='w')
def undo(event=None):
'''if there's any hidden Labels, show one'''
if Kiran.hidden:
Kiran.hidden.pop().show()
def main():
root = tk.Tk()
root.geometry('200x200')
for i in range(11):
lb=Kiran(text=i)
lb.pack(anchor='w')
root.bind("<Control-Key-z>",undo)
root.mainloop()
if __name__ == '__main__':
main()
More notes:
Don't use lambda unless you are forced to; it's known to cause bugs.
Don't use wildcard imports (from module import *), they cause bugs and are against PEP8.
Put everything in functions.
Use long, descriptive names. Single letter names just waste time. Think of names as tiny comments.
Add a lot more comments to your code so that other people don't have to guess what the code is supposed to do.
Try a more beginner oriented forum for questions like this, like learnpython.reddit.com

Maya + PyQt dialog. How to run a single copy of Qt window?

use this simple code for run a window based on qdialog:
import maya.OpenMayaUI as mui
import sip
from PyQt4 import QtGui, QtCore, uic
#----------------------------------------------------------------------
def getMayaWindow():
ptr = mui.MQtUtil.mainWindow()
return sip.wrapinstance(long(ptr), QtCore.QObject)
#----------------------------------------------------------------------
form_class, base_class = uic.loadUiType('perforceBrowserWnd.ui')
#----------------------------------------------------------------------
class PerforceWindow(base_class, form_class):
def __init__(self, parent=getMayaWindow()):
super(base_class, self).__init__(parent)
self.setupUi(self)
#----------------------------------------------------------------------
def perforceBrowser2():
perforceBrowserWnd = PerforceWindow()
perforceBrowserWnd.show()
perforceBrowser2()
every time you run the function perforceBrowser2() there is a new copy of windows.
how to find whether a window is already running and not to open a new copy of it, and go to the opened window? or just do not give a script to run a second copy of window?
ps. maya2014 + pyqt4 + python2.7
Keep a global reference to the window:
perforceBrowserWnd = None
def perforceBrowser2():
global perforceBrowserWnd
if perforceBrowserWnd is None:
perforceBrowserWnd = PerforceWindow()
perforceBrowserWnd.show()
Using global's are not the preferred way and there are ton's of articles why it is a bad idea.
Why are global variables evil?
It is cleaner to remember the instance in a static var of the class and whenever you load the UI, check if it does already exist and return it, if not create it. (There is a pattern called Singleton that describes this behaviour as well)
import sys
from Qt import QtGui, QtWidgets, QtCore
class Foo(QtWidgets.QDialog):
instance = None
def __init__(self, parent=None):
super(Foo, self).__init__(parent)
self.setWindowTitle('Test')
def loadUI():
if not Foo.instance:
Foo.instance = Foo()
Foo.instance.show()
Foo.instance.raise_()
loadUI()
Whenever you call loadUI it will return the same UI and will not recreate each time you call it.

Unit testing twisted.web.client.Agent's without the network

I've not done any twisted now for a couple of years and have started using the newer Agent style of client http calls. Using Agent has been OK, but testing is confusing me (it's twisted after all).
I've been through the https://twistedmatrix.com/documents/current/core/howto/trial.html docs and the APIs on trial tools and Agent itself. Also numerous searches.
I've gone with faking out Agent, as I don't need to test that. But then because of the steps to handle the processing and response of an Agent request, my test code has got nasty, implementing the nested layers of the Agent, protocol, etc. Where should I draw the line here and are there some utils I haven't found?
Here's a minimal example (naive implementation of SUT):
from twisted.web.client import Agent, readBody
from twisted.internet import reactor
import json
class SystemUnderTest(object):
def __init__(self, url):
self.url = url
def action(self):
d = self._makeAgent().request("GET", self.url)
d.addCallback(self._cbSuccess)
return d
def _makeAgent(self):
''' It's own method so can be overridden in tests '''
return Agent(reactor)
def _cbSuccess(self, response):
d = readBody(response)
d.addCallback(self._cbParse)
return d
def _cbParse(self, data):
self.result = json.loads(data)
print self.result
with the test module:
from twisted.trial import unittest
from sut import SystemUnderTest
from twisted.internet import defer
from twisted.test import proto_helpers
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
d.addCallback(self._checks, s_u_t)
return d
def _checks(self, result, s_u_t):
print result
self.assertEqual({'one':1}, s_u_t.result)
class ExtendedSystemUnderTest(SystemUnderTest):
def _makeAgent(self):
return FakeSuccessfulAgent("{'one':1}")
## Getting ridiculous below here...
class FakeReason(object):
def check(self, _):
return False
def __str__(self):
return "It's my reason"
class FakeResponse(object):
''' Implementation of IResponse '''
def __init__(self, content):
self.content = content
self.prot = proto_helpers.StringTransport()
self.code = 200
self.phrase = ''
def deliverBody(self, prot):
prot.makeConnection(self.prot)
prot.dataReceived(self.content)
# reason = FakeReason()
# prot.connectionLost(reason)
class FakeSuccessfulAgent(object):
''' Implementation of IAgent '''
def __init__(self, response):
self.response = response
def request(self, method, url):
return defer.succeed(FakeResponse(self.response))
but testing is confusing me (it's twisted after all).
Hilarious.
class ExtendedSystemUnderTest(SystemUnderTest):
def _makeAgent(self):
return FakeSuccessfulAgent("{'one':1}")
I suggest you make the agent to use a normal parameter. This is more convenient than a private method like _makeAgent. Composition is great. Inheritance is meh.
class FakeReason(object):
...
There's no reason to make a fake of this. Just use twisted.python.failure.Failure. You don't have to fake every object in the test. Just the ones that get in your way if you don't fake them.
class FakeResponse(object):
...
This is likely good and necessary.
class FakeSuccessfulAgent(object):
...
This is most likely necessary as well. You should make it actually be more like an IAgent implementation though - declare that it implements the interface, use zope.interface.verify.verify{Class,Object} to make sure you get the implementation write, etc (eg request has the wrong signature now).
There's actually a ticket for adding all of these testing tools to Twisted itself - https://twistedmatrix.com/trac/ticket/4024. So I don't think you're actually confused, you're basically on the same track as the project itself. You're just suffering from the fact that Twisted hasn't already done all of this work for you.
Also, note that instead of:
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
d.addCallback(self._checks, s_u_t)
return d
You can write something like this instead (and it is preferable):
class Test(unittest.TestCase):
def test1(self):
s_u_t = ExtendedSystemUnderTest(None)
d = s_u_t.action()
self._checks(s_u_t, self.successResultOf(d))
This is because your fake implementation of IAgent is synchronous. You know it is synchronous. By the time request returns, the Deferred it is returning has a result already. Writing the test this way means you can simplify your code a bit (ie, you can ignore the asynchronousness of it to some extent - because it isn't) and it avoids running the global reactor which is what returning a Deferred from a test method in trial does.

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.