How do I add simple delayed tasks in Django? - django

I am creating a chatbot and need a solution to send messages to the user in the future after a specific delay. I have my system set up with Nginx, Gunicorn and Django. The idea is that if the bot needs to send the user several messages, it can delay each subsequent message by a certain amount of time before it sends it to seem more 'human'.
However, a simple threading.Timer approach won't work because the user might interrupt this process at any moment prompting future messages to be changed, but the timer threads might not be available to be stopped as they are on a different worker. So far I have come across two solutions:
Use threading.Timer blindly to check a to-send list in the database, can create problems with lots of unneeded threads. Also makes the database less clean/organized.
Use celery or some other system to execute these future tasks. Seems like overkill and over-engineering a simple problem. Tasks will always just be delayed function calls. Also a hassle dealing with which messages belong to which conversation.
What would be the best solution for this problem?
Also, a more generic question:
Ideally the best solution would be a framework where I can 'simulate' a new bot for each conversation so it acts as its own entity and holds all the state/message queue information in memory for itself. It would be necessary for this framework to only allocate resources to a bot when it needs to do something based on a preset delay or incoming message. Is there anything that exists like this?

Personally I would use Celery for this; executing delayed function calls is its job. And I don't know why knowing what messages belong where would be more of a problem there than doing it in a thread.
But you might also want to investigate the new Django-Channels work that Andrew Godwin is doing, since that is intended to support async background tasks.

Related

Notifying a task from multiple other tasks without extra work

My application is futures-based with async/await, and has the following structure within one of its components:
a "manager", which is responsible for starting/stopping/restarting "workers", based both on external input and on the current state of "workers";
a dynamic set of "workers", which perform some continuous work, but may fail or be stopped externally.
A worker is just a spawned task which does some I/O work. Internally it is a loop which is intended to be infinite, but it may exit early due to errors or other reasons, and in this case the worker must be restarted from scratch by the manager.
The manager is implemented as a loop which awaits on several channels, including one returned by async_std::stream::interval, which essentially makes the manager into a poller - and indeed, I need this because I do need to poll some Mutex-protected external state. Based on this state, the manager, among everything else, creates or destroys its workers.
Additionally, the manager stores a set of async_std::task::JoinHandles representing live workers, and it uses these handles to check whether any workers has exited, restarting them if so. (BTW, I do this currently using select(handle, future::ready()), which is totally suboptimal because it relies on the select implementation detail, specifically that it polls the left future first. I couldn't find a better way of doing it; something like race() would make more sense, but race() consumes both futures, which won't work for me because I don't want to lose the JoinHandle if it is not ready. This is a matter for another question, though.)
You can see that in this design workers can only be restarted when the next poll "tick" in the manager occurs. However, I don't want to use a too small interval for polling, because in most cases polling just wastes CPU cycles. Large intervals, however, can delay restarting a failed/canceled worker by too much, leading to undesired latencies. Therefore, I though I'd set up another channel of ()s back from each worker to the manager, which I'd add to the main manager loop, so when a worker stops due to an error or otherwise, it will first send a message to its channel, resulting in the manager being woken up earlier than the next poll in order to restart the worker right away.
Unfortunately, with any kinds of channels this might result in more polls than needed, in case two or more workers stop at approximately the same time (which due to the nature of my application, is somewhat likely to happen). In such case it would make sense to only run the manager loop once, handling all of the stopped workers, but with channels it will necessarily result in the number of polls equal to the number of stopped workers, even if additional polls don't do anything.
Therefore, my question is: how do I notify the manager from its workers that they are finished, without resulting in extra polls in the manager? I've tried the following things:
As explained above, regular unbounded channels just won't work.
I thought that maybe bounded channels could work - if I used a channel with capacity 0, and there was a way to try and send a message into it but just drop the message if the channel is full (like the offer() method on Java's BlockingQueue), this seemingly would solve the problem. Unfortunately, the channels API, while providing such a method (try_send() seems to be like it), also has this property of having capacity larger than or equal to the number of senders, which means it can't really be used for such notifications.
Some kind of atomic or a mutex-protected boolean flag also look as if it could work, but there is no atomic or mutex API which would provide a future to wait on, and would also require polling.
Restructure the manager implementation to include JoinHandles into the main select somehow. It might do the trick, but it would result in large refactoring which I'm unwilling to make at this point. If there is a way to do what I want without this refactoring, I'd like to use that first.
I guess some kind of combination of atomics and channels might work, something like setting an atomic flag and sending a message, and then skipping any extra notifications in the manager based on the flag (which is flipped back to off after processing one notification), but this also seems like a complex approach, and I wonder if anything simpler is possible.
I recommend using the FuturesUnordered type from the futures crate. This collection allows you to push many futures of the same type into a collection and wait for any one of them to complete at once.
It implements Stream, so if you import StreamExt, you can use unordered.next() to obtain a future that completes once any future in the collection completes.
If you also need to wait for a timeout or mutex etc., you can use select to create a future that completes once either the timeout or one of the join handles completes. The future returned by next() implements Unpin, so it is usable with select without problems.

Quickfix: acceptor and initator in same application?

I am new to quickfix (I'm a student trying to teach myself), and have downloaded the examples from quickfix.org (in c++) and have been able to connect ordermatch to tradeclient and get them talking to each other. I changed the config file for ordermatch to allow multiple clients and got that working (ordermatch can receive orders from multiple clients and manage the order book).
I have been trying to find a way to alter ordermatch to send it's confirm messages to ALL clients, not just the sender.
I have a seperate implementation of a limit orderbook and want to crack the incoming messages (orders, cancels, etc) and store them in my limit orderbook. My orderbook watches the book an makes trading decisions based on it. The problem is, I can't figure out how to get ordermatch to send all updates to this client. Further, I am having a hard time figuring out how to "soup up" the tradeclient to not only send orders, but receive and crack them.
I'm thinking I need to have an acceptor and an initator in each application(in ordermatch and in one of the tradeclients)--I've read this is possible and common but can't find any sample code. Am I on the right track here, or is there a better way to set this up? Does anybody have some sample code they can share? I am not planning on using this for live trading so crude code is perfectly fine by me.
Thanks in advance
Brandon
Same application can act as Initiator for one session and Acceptor for different session.
Infact you can have multiple Acceptor/Initiator sessions from same application.
Config file needs to define multiple sessions.
Or you can have separate config file for each session.
If I understand correctly, I think what you're trying to do is intercept messages between an OMS and a broker (c.f. client and server) and act depending on what they contain. There are a few ways you could do this, including intercepting at the TCP layer, but I think that the easiest way might be to use 2 separate programs as #DumbCoder suggests and connect to one of them as an acceptor from your clients, process the messages and then pass them on to another program via another protocol and then send them on from the other program. Theoretically you can create another instance of the engine in your program and, by using different config files on creation (when FIX::FileStoreFactory storeFactory(*settings); is called) of each instance of the engine. However, I have never seen this done and so feel that it could cause problems. If you do try this method I would strongly advise putting the initiator and the connector in different dlls which might just separate the two engine instances enough.

XMLRPCPP asynchronously handling multiple calls?

I have a remote server which handles various different commands, one of which is an event fetching method.
The event fetch returns right away if there is 1 or more events listed in the queue ready for processing. If the event queue is empty, this method does not return until a timeout of a few seconds. This way I don't run into any HTTP/socket timeouts. The moment an event becomes available, the method returns right away. This way the client only ever makes connections to the server, and the server does not have to make any connections to the client.
This event mechanism works nicely. I'm using the boost library to handle queues, event notifications, etc.
Here's the problem. While the server is holding back on returning from the event fetch method, during that time, I can't issue any other commands.
In the source code, XmlRpcDispatch.cpp, I'm seeing in the "work" method, a simple loop that uses a blocking call to "select".
Seems like while the handling of a method is busy, no other requests are processed.
Question: am I not seeing something and can XmlRpcpp (xmlrpc++) handle multiple requests asynchronously? Does anyone know of a better xmlrpc library for C++? I don't suppose the Boost library has a component that lets me issue remote commands?
I actually don't care about the XML or over-HTTP feature. I simply need to issue (asynchronous) commands over TCP in any shape or form?
I look forward to any input anyone might offer.
I had some problems with XMLRPC also, and investigated many solutions like GSoap and XMLRPC++, but in the end I gave up and wrote the whole HTTP+XMLRPC from scratch using Boost.ASIO and TinyXML++ (later I swaped TinyXML to expat). It wasn't really that much work; I did it myself in about a week, starting from scratch and ending up with many RPC calls fully implemented.
Boost.ASIO gave great results. It is, as its name says, totally async, and with excellent performance with little overhead, which to me was very important because it was running in an embedded environment (MIPS).
Later, and this might be your case, I changed XML to Google's Protocol-buffers, and was even happier. Its API, as well as its message containers, are all type safe (i.e. you send an int and a float, and it never gets converted to string and back, as is the case with XML), and once you get the hang of it, which doesn't take very long, its very productive solution.
My recomendation: if you can ditch XML, go with Boost.ASIO + ProtobufIf you need XML: Boost.ASIO + Expat
Doing this stuff from scratch is really worth it.

Django: start a process in a background thread?

I'm trying to work out how to run a process in a background thread in Django. I'm new to both Django and threads, so please bear with me if I'm using the terminology wrong.
Here's the code I have. Basically I'd like start_processing to begin as soon as the success function is triggered. However start_processing is the kind of function that could easily take a few minutes or fail (it's dependent on an external service over which I have no control), and I don't want the user to have to wait for it to complete successfully before the view is rendered. ('Success' as far as they are concerned isn't dependent on the result of start_processing; I'm the only person who needs to worry if it fails.)
def success(request, filepath):
start_processing(filepath)
return render_to_response('success.html', context_instance = RequestContext(request))
From the Googling I've done, most people suggest that background threads aren't used in Django, and instead a cron job is more suitable. But I would quite like start_processing to begin as soon as the user gets to the success function, rather than waiting until the cron job runs. Is there a way to do this?
If you really need a quick hack, simply start a process using subprocess.
But I would not recommend spawning a process (or even a thread), especially if your web site is public: in case of high load (which could be "natural" or the result of a trivial DoS attack), you would be spawning many processes or threads, which would end up using up all your system resources and killing your server.
I would instead recommend using a job server: I use Celery (with Redis as the backend), it's very simple and works just great. You can check out many other job servers, such as RabbitMQ or Gearman. In your case, a job server might be overkill: you could simply run Redis and use it as a light-weight message server. Here is an example of how to do this.
Cheers
In case someone really wants to run another thread
def background_process():
import time
print("process started")
time.sleep(100)
print("process finished")
def index(request):
import threading
t = threading.Thread(target=background_process, args=(), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponse("main thread content")
This will return response first, then print "process finished" to console. So user will not face any delay.
Using Celery is definitely a better solution. However, installing Celery could be unnecessary for a very small project with a limited server etc.
You may also need to use threads in a big project. Because running Celery in all your servers is not a good idea. Then there won't be a way to run a separate process in each server. You may need threads to handle this case. File system operations might be an example. It's not very likely though and it is still better to use Celery with long running processes.
Use wisely.
I'm not sure you need a thread for that. It sounds like you just want to spawn off a process, so look into the subprocess module.
IIUC, The problem here is that the webserver process might not like extra long-running threads, it might kill/spawn server processes as demand go up and down, etc etc.
You're probably better of by communicating to an external service process for this type of processing, instead of embedding it in in the webserver's wsgi/fastcgi process.
If the only thing you're sending over is the filepath, it ought to be pretty easy to write that service app.

Web application background processes, newbie design question

I'm building my first web application after many years of desktop application development (I'm using Django/Python but maybe this is a completely generic question, I'm not sure). So please beware - this may be an ultra-newbie question...
One of my user processes involves heavy processing in the server (i.e. user inputs something, server needs ~10 minutes to process it). On a desktop application, what I would do it throw the user input into a queue protected by a mutex, and have a dedicated background thread running in low priority blocking on the queue using that mutex.
However in the web application everything seems to be oriented towards synchronization with the HTTP requests.
Assuming I will use the database as my queue, what is best practice architecture for running a background process?
There are two schools of thought on this (at least).
Throw the work on a queue and have something else outside your web-stack handle it.
Throw the work on a queue and have something else in your web-stack handle it.
In either case, you create work units in a queue somewhere (e.g. a database table) and let some process take care of them.
I typically work with number 1 where I have a dedicated windows service that takes care of these things. You could also do this with SQL jobs or something similar.
The advantage to item 2 is that you can more easily keep all your code in one place--in the web tier. You'd still need something that triggers the execution (e.g. loading the web page that processes work units with a sufficiently high timeout), but that could be easily accomplished with various mechanisms.
Since:
1) This is a common problem,
2) You're new to your platform
-- I suggest that you look in the contributed libraries for your platform to find a solution to handle the task. In addition to queuing and processing the jobs, you'll also want to consider:
1) status communications between the worker and the web-stack. This will enable web pages that show the percentage complete number for the job, assure the human that the job is progressing, etc.
2) How to ensure that the worker process does not die.
3) If a job has an error, will the worker process automatically retry it periodically?
Will you or an operations person be notified if a job fails?
4) As the number of jobs increase, can additional workers be added to gain parallelism?
Or, even better, can workers be added on other servers?
If you can't find a good solution in Django/Python, you can also consider porting a solution from another platform to yours. I use delayed_job for Ruby on Rails. The worker process is managed by runit.
Regards,
Larry
Speaking generally, I'd look at running background processes on a different server, especially if your web server has any kind of load.
Running long processes in Django: http://iraniweb.com/blog/?p=56