Timer object inconsistently accessible - django

I am starting a Python Timer in a Django view and I am using another Django view to cancel it. However, I find that I cannot access the Timer object consistently when I am trying to cancel it.
The code in my "views.py" looks like this:
import threading
myTimer = None
def f():
pass
def startTimer(request):
global myTimer
myTimer = threading.Timer(10000, f)
myTimer.start()
pass
def stopTimer(request):
if myTimer != None:
myTimer.cancel()
else:
print("No timer found.")
pass
When I try to cancel the timer, many times, I get the "No timer found." message. After some tries, seemingly in a random fashion, the Timer object is found and the cancellation succeeds. This phenomenon happens only when I run the code on the server. When the code runs on my local machine, this problem never happens.

You must never use global objects like this in a server environment. Your server almost certainly has multiple processes, each of which have their own local namespaces, so the timer won't be shared between them.
A second reason is that you will likely have multiple users for your site; all of them will have access to the same global variables in each process.
I'm not really sure what you're doing here, but one way of doing a per-user timer would be to use the session to store the current time when the user hits start, and then calculate the difference from that time when they click end.

Related

wait for an asyn_task to complete or complete it in background

I have some functions in my Django application, that takes lot of time (scraping using proxies), it takes sometimes more than 30sec and is killed by gunicorn and AWS server due to timeout. and I don't want to increase the timeout value.
A solution that came to my mind is to run these functions as a async_task using django-q module.
Is it possible to do the following:
when a view calls the long function, Run the function in async way
if it returns a result in a pre-defined amount of time return the result to user.
If not, return an incomplete result and continue the function in the background (The function changes a model in the database) no need to notify the user of changes.

Django: for loop through parallel process and store values and return after it finishes

I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.
You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).
You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.

Making stored function based confirmation / callback system work with multiple processes in django + nginx

Our callback system worked such that during a request where you needed more user input you would run the following:
def view(req):
# do checks, maybe get a variable.
bar = req.bar()
def doit():
foo = req.user
do_the_things(foo, bar)
req.confirm(doit, "Are you sure you want to do it")
From this, the server would store the function object in a dictionary, with a UID as a key that would be send to the client, where a confirmation dialog would be shown. When OK is pressed another request is sent to another view which looks up the stored function object and runs it.
This works in a single process deployment. However with nginx, if there's a process pool greater than 1, a different process gets the confirmation request, and thus doesn't have the stored function, and can no run.
We've looked into ways to force nginx to use a certain process for certain requests, but haven't found a solution.
We've also looked into multiprocessing libraries and celery, however there doesn't seem to be a way to send a predefined function into another process.
Can anyone suggest a method that will allow us to store a function to run later when the request for continuing might come from a separate process?
There doesn't seem to be a good reason to use a callback defined as an inline function here.
The web is a stateless environment. You can never be certain of getting the same server process two requests in a row, and your code should never be written to store data in memory.
Instead you need to put data into a data store of some kind. In this case, the session is the ideal place; you can store the IDs there, then redirect the user to a view that pops that key from the session and runs the process on the relevant IDs. Again, no need for an inline function at all.

Django global variable to store immutable object

I'm trying to set up python-telegram-bot library in webhook mode with Django. That should work as follows: on Django startup, I do some initial setting of python-telegram-bot and get a dispatcher object as a result. Django listens to /telegram_hook url and receives updates from Telegram servers. What I want to do next is to pass the updates to the process_update method of the dispatcher created on startup. It contains all the parsing logic and invokes callbacks specified during setup.
The problem is that the dispatcher object needs to be saved globally. I know that global states are evil but that's not really a global state because the dispatcher is immutable. However, I still don't know where to put it and how to ensure that it will be visible to all threads after setup phase is finished. So the question is how do I properly save the dispatcher after setup to invoke it from Django's viewset?
P.S. I know that I could use a built-in web server or use polling or whatever. However, I have reasons to use Django and I anyway would like to know how to deal with cases like that because it's not the only situation I can imagine when I need to store an immutable object created on startup globally.
It looks like you need thread safe singleton like this one https://gist.github.com/werediver/4396488 or http://alacret.blogspot.ru/2015/04/python-thread-safe-singleton-pattern.html
import threading
# Based on tornado.ioloop.IOLoop.instance() approach.
# See https://github.com/facebook/tornado
class SingletonMixin(object):
__singleton_lock = threading.Lock()
__singleton_instance = None
#classmethod
def instance(cls):
if not cls.__singleton_instance:
with cls.__singleton_lock:
if not cls.__singleton_instance:
cls.__singleton_instance = super(SingletonMixin, cls).__new__(cls)
return cls.__singleton_instance

Django: start a process in a background thread?

I'm trying to work out how to run a process in a background thread in Django. I'm new to both Django and threads, so please bear with me if I'm using the terminology wrong.
Here's the code I have. Basically I'd like start_processing to begin as soon as the success function is triggered. However start_processing is the kind of function that could easily take a few minutes or fail (it's dependent on an external service over which I have no control), and I don't want the user to have to wait for it to complete successfully before the view is rendered. ('Success' as far as they are concerned isn't dependent on the result of start_processing; I'm the only person who needs to worry if it fails.)
def success(request, filepath):
start_processing(filepath)
return render_to_response('success.html', context_instance = RequestContext(request))
From the Googling I've done, most people suggest that background threads aren't used in Django, and instead a cron job is more suitable. But I would quite like start_processing to begin as soon as the user gets to the success function, rather than waiting until the cron job runs. Is there a way to do this?
If you really need a quick hack, simply start a process using subprocess.
But I would not recommend spawning a process (or even a thread), especially if your web site is public: in case of high load (which could be "natural" or the result of a trivial DoS attack), you would be spawning many processes or threads, which would end up using up all your system resources and killing your server.
I would instead recommend using a job server: I use Celery (with Redis as the backend), it's very simple and works just great. You can check out many other job servers, such as RabbitMQ or Gearman. In your case, a job server might be overkill: you could simply run Redis and use it as a light-weight message server. Here is an example of how to do this.
Cheers
In case someone really wants to run another thread
def background_process():
import time
print("process started")
time.sleep(100)
print("process finished")
def index(request):
import threading
t = threading.Thread(target=background_process, args=(), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponse("main thread content")
This will return response first, then print "process finished" to console. So user will not face any delay.
Using Celery is definitely a better solution. However, installing Celery could be unnecessary for a very small project with a limited server etc.
You may also need to use threads in a big project. Because running Celery in all your servers is not a good idea. Then there won't be a way to run a separate process in each server. You may need threads to handle this case. File system operations might be an example. It's not very likely though and it is still better to use Celery with long running processes.
Use wisely.
I'm not sure you need a thread for that. It sounds like you just want to spawn off a process, so look into the subprocess module.
IIUC, The problem here is that the webserver process might not like extra long-running threads, it might kill/spawn server processes as demand go up and down, etc etc.
You're probably better of by communicating to an external service process for this type of processing, instead of embedding it in in the webserver's wsgi/fastcgi process.
If the only thing you're sending over is the filepath, it ought to be pretty easy to write that service app.