I am using Python2.7 ,Python-firebase 1.2 .
If we comment firebase import then it is giving output only once or else it is giving multiple times.
from firebase import firebase
print "result"
output:
result
result
result
result
That firebase module was written by bad programmers as it performs tasks that you don't explicitly ask for. For that reason, I would advise anybody to steer clear from using that module because you cannot know what other booby traps they might have in their code. Sure, they probably think this behavior is convenient, but convenience is everything but breaking the expectations of programmers (which is the one rule that absolutely every module writer has to follow) and if it was convenient this question wouldn't exist. They do say that it relies heavily on multiprocessing but they don't mention you won't have a say in it:
The interface heavily depends on the standart multiprocessing library when concurrency comes in. While creating an asynchronous call, an on-demand process pool is created and, the async method is executed by one of the idle process inside the pool. The pool remains alive until the main process dies. So every time you trigger an async call, you always use the same pool. When the method returns, the pool process ships the returning value back to the main process within the callback function provided.
So, all that being said... This happens because the main __init__.py of that module imports its async.py module, which in turn creates a multiprocessing.Pool (set to its _process_pool) with 5 fixed slots, and given nothing to work with you get 5 additional processes of your main script - hence, it prints out result 6 times (the main process and the 5 spawned sub-processes).
Bottom line - do not use this module. There are other alternatives, but if you absolutely have to - guard your code with a main process check:
if __name__ == "__main__":
print("result")
It will still spawn 5 subprocesses, and wait for all of them to finish (which is rather quick) but at least it won't execute your guarded code.
Related
I'm interested in implementing a dialogue system similar to what is being done here http://fungusdocs.snozbot.com/lua_controlling_fungus.html .
-- Display text in a SayDialog
say("Hi there")
say "This syntax also works for say commands"
-- Display a list of options in a MenuDialog
-- (Note the curly braces here!)
local choice = choose{ "Go left", "Go right" }
if choice == 1 then
say("You chose left")
elseif choice == 2 then
say("You chose right")
end
My takeaway from this lua code snippet is that the code is very easy to write and follow along, and I look to use a similar approach. What I wonder is how this can be implemented without stalling the engine code while waiting for a choice.
the function call choose{ "Go left", "Go right" } return a value which makes me want to say that this is a synchronous call. Since we're calling the engine code synchronous we then halt the engine, yet this function call should not be the one directly answering the question - I believe that it needs to be answered in the regular main loop as not to interfere with the rest of the program.
To my understanding the only way to solve this would be to rely on multi-threading. to have the script handled on a separate thread that on the choose call first adds a prompt, then waits for the prompt to be answered, fetch the result, and then continues executing the lua script.
What would be a good way to solve this without making the lua code cumbersome to work with?
Normally you'd run the blocking code in Lua thread (coroutine).
Your choose{} call would yield internally, and the app would resume that thread periodically on external events (input/render/whatever). That way you can have the main loop running freely, taking results from that dialog on nearest iteration after dialog is ready.
The object serving choose{} call might trigger some event on completion, which might be monitored by application's bigger system, the same system that would wait for completion of other asynchronous tasks (file loaded, http request served, etc).
I want to invoke CefV8Context::Eval function and get the returned value in browser process's UI thread. But the CEF3 C++ API Docs states that V8 handles can only be accessed from the thread on which they are created. Valid threads for creating a V8 handle include the render process main thread (TID_RENDERER) and WebWorker threads. Is that means I should use the inter-process communication (CefProcessMessage) to invoke that method and get the return value? If so, how to do this in synchronous mode?
Short answer: CefFrame::ExecuteJavaScript for simple requests will work. For more complex ones, you have to give up one level of synchronousness or use a custom message loop.
What I understand you want to do is to execute some Javascript code as part of your native App's UI Thread. There are two possibilities:
It's generic JS code, doesn't really access any variables or functions in your JS, and as such has not context. This means Cef can just spin up a new V8 context and execute your code - see CefFrame::ExecuteJavaScript(). To quote the examples on CEF's JS Integration link:
CefRefPtr browser = ...;
CefRefPtr frame = browser->GetMainFrame();
frame->ExecuteJavaScript("alert('ExecuteJavaScript works!');",
frame->GetURL(), 0);
It's JS code with a context. In this case, read on.
Yes - CEF is designed such that only the RenderProcess has access to the V8 engine, you'll have to use a CefProcessMessage to head over to the Browser and do the evaluation there. You sound like you already know how to do that. I'll link an answer of mine for others who don't and may stumble upon this later: Background process on the native function at Chromium Embedded Framework
The CEFProcessMessage from Browser to Render processes is one place where the request has to be synchronized.
So after your send your logic over to the render process, you'll need to do the actual execution of the javascript code. That, thankfully, is quite easy - the same JS integration link goes on to say:
Native code can execute JS functions by using the ExecuteFunction()
and ExecuteFunctionWithContext() methods
The best part - the execution seems to be synchronous (I say seems to, since I can't find concrete docs on this). The usage in the examples illustrates this:
if (callback_func_->ExecuteFunctionWithContext(callback_context_, NULL, args, retval, exception, false)) {
if (exception.get()) {
// Execution threw an exception.
} else {
// Execution succeeded.
}
}
You'll notice that the second line assumes that the first has finished execution and that the results of said execution are available to it. So, The CefV8Value::ExecuteFunction() call is by nature synchronous.
So the question boils down to - How do I post a CefProcessMessage from Browser to Renderer process synchronously?. Unfortunately, the class itself is not set up to do that. What's more, the IPC Wiki Page explicitly disallows it:
Some messages should be synchronous from the renderer's perspective.
This happens mostly when there is a WebKit call to us that is supposed
to return something, but that we must do in the browser. Examples of
this type of messages are spell-checking and getting the cookies for
JavaScript. Synchronous browser-to-renderer IPC is disallowed to
prevent blocking the user-interface on a potentially flaky renderer.
Is this such a big deal? Well, I don't really know since I've not come across this need - to me, it's ok since the Browser's message loop will keep spinning away waiting for something to do, and receive nothing till your renderer sends a process message with the results of JS. The only way the browser gets something else to do is when some interaction happens, which can't since the renderer is blocking.
If you really definitely need synchronousness, I'd recommend that you use your custom MessageLoop which calls CefDoMessageLoopWork() on every iteration. That way, you can set a flag to suspend loop work until your message is received from renderer. Note that CefDoMessageLoopWork() and CefRunMessageLoop() are mutually exclusive and cannot work with each other - you either manage the loop yourself, or let CEF do it for you.
That was long, and covers most of what you might want to do - hope it helps!
Well my problem is the following. I have a piece of code that runs on several virtual machines, and each virtual machine has N interfaces(a thread per each). The problem itself is receiving a message on one interface and redirect it through another interface in the fastest possible manner.
What I'm doing is, when I receive a message on one interface(Unicast), calculate which interface I want to redirect it through, save all the information about the message(Datagram, and all the extra info I want) with a function I made. Then on the next iteration, the program checks if there are new messages to redirect and if it is the correct interface reading it. And so on... But this makes the program exchange information very slowly...
Is there any mechanism that can speed things up?
Somebody has already invented this particular wheel - it's called MPI
Take a look at either openMPI or MPICH
Why don't you use queuing? As the messages come in, put them on a queue and notify each processing module to pick them up from the queue.
For example:
MSG comes in
Module 1 puts it on queue
Module 2,3 get notified
Module 2 picks it up from queue and saved it in the database
In parallel, Module 3 picks it up from queue and processes it
The key is "in parallel". Since these modules are different threads, while Module 2 is saving to the db, Module 3 can massage your message.
You could use JMS or MQ or make your own queue.
It sounds like you're trying to do parallel computing across multiple "machines" (even if virtual). You may want to look at existing protocols, such as MPI - Message Passing Interface to handle this domain, as they have quite a few features that help in this type of scenario
Disclaimer: I do know that there are several similar questions on SO. I think I've read most if not all of them, but did not find an answer to my real question (see later).
I also do know that using celery or other asynchronous queue systems is the best way to achieve long running tasks - or at least use a cron-managed script. There's also mod_wsgi doc about processes and threads but I'm not sure I got it all correct.
The question is:
what are the exact risks/issues involved with using the solutions listed down there? Is any of them viable for long running tasks (ok, even though celery is better suited)?
My question is really more about understanding the internals of wsgi and python/django than finding the best overall solution. Issues with blocking threads, unsafe access to variables, zombie processing, etc.
Let's say:
my "long_process" is doing something really safe. even if it fails i don't care.
python >= 2.6
I'm using mod_wsgi with apache (will anything change with uwsgi or gunicorn?) in daemon mode
mod_wsgi conf:
WSGIDaemonProcess NAME user=www-data group=www-data threads=25
WSGIScriptAlias / /path/to/wsgi.py
WSGIProcessGroup %{ENV:VHOST}
I figured that these are the options available to launch separate processes (meant in a broad sense) to carry on a long running task while returning quickly a response to the user:
os.fork
import os
if os.fork()==0:
long_process()
else:
return HttpResponse()
subprocess
import subprocess
p = subprocess.Popen([sys.executable, '/path/to/script.py'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
(where the script is likely to be a manage.py command)
threads
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
NB.
Due to the Global Interpreter Lock, in CPython only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better of use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
The main thread will quickly return (the httpresponse). Will the spawned long thread block wsgi from doing something else for another request?!
multiprocessing
from multiprocessing import Process
p = Process(target=_bulk_action,args=(action,objs))
p.start()
return HttpResponse()
This should solve the thread concurrency issue, shouldn't it?
So those are the options I could think of. What would work and what not, and why?
os.fork
A fork will clone the parent process, which in this case, is your Django stack. Since you're merely wanting to run a separate python script, this seems like an unnecessary amount of bloat.
subprocess
Using subprocess is expected to be interactive. In other words, while you can use this to effectively spawn off a process, it's expected that at some point you'll terminate it when finished. It's possible Python might clean up for you if you leave one running, but my guess would be that this will actually result in a memory leak.
threading
Threads are defined units of logic. They start when their run() method is called, and terminate when the run() method's execution ends. This makes them well suited to creating a branch of logic that will run outside the current scope. However, as you mentioned, they are subject to the Global Interpreter Lock.
multiprocessing
This module allows you to spawn processes, and it has an API similar to that of threading. You could say is like threads on steroids. These processes are not subject to the Global Interpreter Lock, and they can take advantage of multi-core architectures. However, they are more complicated to work with as a result.
So, your choices really come down to threads or processes. If you can get by with a thread and it makes sense for your application, go with a thread. Otherwise, use processes.
I have found that using uWSGI Decorators is quite simpler than using Celery if you need just run some long task in background.
Think Celery is best solution for serious heavy project, and it's overhead for doing something simple.
For start using uWSGI Decorators you just need to update your uWSGI config with
<spooler-processes>1</spooler-processes>
<spooler>/here/the/path/to/dir</spooler>
write code like:
#spoolraw
def long_task(arguments):
try:
doing something with arguments['myarg'])
except Exception as e:
...something...
return uwsgi.SPOOL_OK
def myView(request)
long_task.spool({'myarg': str(someVar)})
return render_to_response('done.html')
Than when you start view in uWSGI log appears:
[spooler] written 208 bytes to file /here/the/path/to/dir/uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414
and when task finished:
[spooler /here/the/path/to/dir pid: 31138] done with task uwsgi_spoolfile_on_hostname_31139_2_0_1359694428_441414 after 78 seconds
There is strange(for me) restrictions:
- spool can receive as argument only dictionary of strings, look like because it's serialize in file as strings.
- spool should be created on start up so "spooled" code it should be contained in separate file which should be defined in uWSGI config as <import>pyFileWithSpooledCode</import>
For the question:
Will the spawned long thread block wsgi from doing something else for
another request?!
the answer is no.
You still have to be careful creating background threads from a request though in case you simply create huge numbers of them and clog up the whole process. You really need a task queueing system even if you are doing stuff in process.
In respect of doing a fork or exec from web process, especially from Apache that is generally not a good idea as Apache may impose odd conditions on the environment of the sub process created which could technically interfere with its operation.
Using a system like Celery is still probably the best solution.
I'm trying to work out how to run a process in a background thread in Django. I'm new to both Django and threads, so please bear with me if I'm using the terminology wrong.
Here's the code I have. Basically I'd like start_processing to begin as soon as the success function is triggered. However start_processing is the kind of function that could easily take a few minutes or fail (it's dependent on an external service over which I have no control), and I don't want the user to have to wait for it to complete successfully before the view is rendered. ('Success' as far as they are concerned isn't dependent on the result of start_processing; I'm the only person who needs to worry if it fails.)
def success(request, filepath):
start_processing(filepath)
return render_to_response('success.html', context_instance = RequestContext(request))
From the Googling I've done, most people suggest that background threads aren't used in Django, and instead a cron job is more suitable. But I would quite like start_processing to begin as soon as the user gets to the success function, rather than waiting until the cron job runs. Is there a way to do this?
If you really need a quick hack, simply start a process using subprocess.
But I would not recommend spawning a process (or even a thread), especially if your web site is public: in case of high load (which could be "natural" or the result of a trivial DoS attack), you would be spawning many processes or threads, which would end up using up all your system resources and killing your server.
I would instead recommend using a job server: I use Celery (with Redis as the backend), it's very simple and works just great. You can check out many other job servers, such as RabbitMQ or Gearman. In your case, a job server might be overkill: you could simply run Redis and use it as a light-weight message server. Here is an example of how to do this.
Cheers
In case someone really wants to run another thread
def background_process():
import time
print("process started")
time.sleep(100)
print("process finished")
def index(request):
import threading
t = threading.Thread(target=background_process, args=(), kwargs={})
t.setDaemon(True)
t.start()
return HttpResponse("main thread content")
This will return response first, then print "process finished" to console. So user will not face any delay.
Using Celery is definitely a better solution. However, installing Celery could be unnecessary for a very small project with a limited server etc.
You may also need to use threads in a big project. Because running Celery in all your servers is not a good idea. Then there won't be a way to run a separate process in each server. You may need threads to handle this case. File system operations might be an example. It's not very likely though and it is still better to use Celery with long running processes.
Use wisely.
I'm not sure you need a thread for that. It sounds like you just want to spawn off a process, so look into the subprocess module.
IIUC, The problem here is that the webserver process might not like extra long-running threads, it might kill/spawn server processes as demand go up and down, etc etc.
You're probably better of by communicating to an external service process for this type of processing, instead of embedding it in in the webserver's wsgi/fastcgi process.
If the only thing you're sending over is the filepath, it ought to be pretty easy to write that service app.