According to Django is synchronous or asynchronous. Django is synchronous. However I tested a blocking view with python manage.py runserver 8100:
import time
#action(detail=False, methods=['get'])
def test(self, request):
time.sleep(5)
return {}
and triggered two requests with postman at 0s, 1s, and they returned at 5s, 6s. That seems not blocking/synchronous. Where am I wrong?
Even synchronous implementations handle usually multiple requests 'in parallel'.
They do so by using multiple processes, multiple threads or a mix of it.
Depending on the server they have a predefined (fixed) amount of processes or threads or they dynamically allocate threads or processes whenever another requests requires one.
An asynchronous server on the other hand can handle multiple requests 'in parallel' within only one thread / process.
The simple development server, that you can start with management.py runserver is using threading by default.
To best visualize this I suggest to change your code to:
import time
import os
import threading
#action(detail=False, methods=['get'])
def test(self, request):
print("START PID", os.getpid(), "TID", threading.get_native_id())
time.sleep(5)
print("STOP PID", os.getpid(), "TID", threading.get_native_id())
return {pid=os.getpid(), tid=threading.get_native_id()}
As #xyres mentioned: There is a command line option to disable threading.
Just run manage.py runserver --nothreading and try again. Now you should be able to visualize the full synchronous behavior.
Related
From version 3.1 Django supports async views. I have a Django app running on uvicorn. I'm trying to write an async view, which can handle multiple requests to itself concurrently, but have no success.
Common examples, I've seen, include making multiple slow I/O operations from inside the view:
async def slow_io(n, result):
await asyncio.sleep(n)
return result
async def my_view(request, *args, **kwargs):
task_1 = asyncio.create_task(slow_io(3, 'Hello'))
task_2 = asyncio.create_task(slow_io(5, 'World'))
result = await task_1 + await task_2
return HttpResponse(result)
This will yield us "HelloWorld" after 5 seconds instead of 8 because requests are run concurrently.
What I want - is to concurrently handle multiple requests TO my_view. E.g. I expect this code to handle 2 simultaneous requests in 5 seconds, but it takes 10.
async def slow_io(n, result):
await asyncio.sleep(n)
return result
async def my_view(request, *args, **kwargs):
result = await slow_io(5, 'result')
return HttpResponse(result)
I run uvicorn with this command:
uvicorn --host 0.0.0.0 --port 8000 main.asgi:application --reload
Django doc says:
The main benefits are the ability to service hundreds of connections without using Python threads.
So it's possible.
What am I missing?
UPD:
It seems, my testing setup was wrong. I was opening multiple tabs in browser and refreshing them all at once. See my answer for details.
Here is a sample project on Django 3.2 with multiple async views and tests. I tested it multiple ways:
Requests from Django's test client are handled simultaneously, as expected.
Requests to different views from single client are handled simultaneously, as expected.
Requests to the same view from different clients are handled simultaneously, as expected.
What doesn't work as expected?
Requests to the same view from single client are handled by one at a time, and I didn't expect that.
There is a warning in Django doc:
You will only get the benefits of a fully-asynchronous request stack if you have no synchronous middleware loaded into your site. If there is a piece of synchronous middleware, then Django must use a thread per request to safely emulate a synchronous environment for it.
Middleware can be built to support both sync and async contexts. Some of Django’s middleware is built like this, but not all. To see what middleware Django has to adapt, you can turn on debug logging for the django.request logger and look for log messages about “Synchronous middleware … adapted”.
So it might be some sync middleware causing the problem even in the bare Django. But it also states that Django should use threads in this case and I didn't use any sync middleware, only standard ones.
My best guess, that it's related to client-server connection and not to sync or async stack. Despite Google says that browsers create new connections for each tab due to security reasons, I think:
Browser might keep the same connection for multiple tabs if the url is the same for economy reasons.
Django creates async tasks or threads per connection and not per request, as it states.
Need to check it out.
Your problem is that you write code like sync version. And you await every function result and only after this, you await next function.
You simple need use asyncio functions like gather to run all tasks asynchronously:
import asyncio
async def slow_io(n, result):
await asyncio.sleep(n)
return result
async def my_view(request, *args, **kwargs):
all_tasks_result = await asyncio.gather(slow_io(3, 'Hello '), slow_io(5, "World"))
result = "".join(all_tasks_result)
return HttpResponse(result)
So, I have built this nice application(using camelot) and do not know how to implement a scheduled job on it, that can send emails based on some conditions at regular intervals. I am trying to implement it using schedule but don't know how to make my app call it automatically.
here is my code:
import time, schedule
def job():
print "I'm working..."
schedule.every(10).seconds.do(job)
while True:
schedule.run_pending()
time.sleep(1)
When I start my application, nothing works. How do I make my application be aware of this.
If possible, I would like this schedule job to execute without even starting my application.
If you want to schedule jobs that will interact with the user, you have to use Camelot Actions, which have their own method for executing functions in the model thread, posting the result to the GUI thread, and back...
But you don't need that, you just need to run jobs (functions) that will access the database, create emails and send it, without interacting with the user. This can be done with a completly independent application, without GUI.
In order to avoid creating two applications, you can change the behavior using a command line parameter. If you start the application without any params, it will open the GUI as usual, but if you run it with -background, it will only start the schedule loop.
Then you can hook your application on system start, executing it with "-background", and you will have the schedule running without requiring your users to start the application.
If later a user starts the application, then you'll have two instances running, the first one running the schedule loop, and the second one with the GUI.
main.py:
if __name__ == '__main__':
import sys
if "-background" in sys.argv:
import background
background.main()
else:
main() #camelot's main
background.py:
from camelot.core.orm import Session
import time, schedule
def job():
session = Session()
#Use session to access and manipulate model
print "I'm working..."
session.flush() #flush session to DB
def main():
from camelot.core.conf import settings
settings.setup_model()
schedule.every(10).seconds.do(job)
while True:
schedule.run_pending()
time.sleep(1)
You should run the app with python main.py -background.
I read http://bottlepy.org/docs/dev/tutorial_app.html#server-setup
and running Apache + Bottle + Python
and Bottle + Apache + WSGI + Sessions
and I would like to know if one can run asynchronous rest api calls to bottle on mod_wsgi server to a py function that does not return anything(its a backend logic) and is non blocking - so I looked up gevent but i am haven't found a solution where you can run mod_wsgi with gevents.
Is there any solution to async calls to run on apache server using mod_wsgi or any other alternative?
UPDATE
as per andreans' answer below;
I ran a simple myip address return with bottle + celery. so one has to run a celery as #celery.task and then run(host='localhost', port=8080, debug=True)? does it require to start celery worker on terminal as well? never used celery before [runnin locally] also running bottle with decorator #route(/something) works but app.route doesnt where app = Bottle() possibly due to some .wsgi file error?
Sorry, can't fit into the comment box. Every request must get a response eventually (or fail/time out). If you really don't need to return any data to the client, send back just an empty response with a status code. If the processing of the request takes time, it should run asynchronously, and that's where celery comes in. So a blocking implementation of your request handler:
def request_processor_long_running_blocking_func(request_data):
# process request data, which takes a lot of time
# result is probably written into db
pass
def request_handler_func(request):
request_processor_long_running_blocking_func(request.data)
return HttpResponse(status=200)
If I understood correctly this is what you're trying to avoid, by making the request_processor_long_running_blocking_func run asynchronously, so the request_handler_func won't block. This would be solved with celery like this:
from celery.task import task
#task
def request_processor_long_running_blocking_func(request_data):
# the task decorator wraps your blocking function, with celery's Task class
# which has a delay method available for you to call, which will run your function
# asynchronously on one of your celery background worker processes
pass
def request_handler_func(request):
request_processor_long_running_blocking_func.delay(request.data)
# calling the function with delay won't block, it returns immediately
# and your response is sent back instantly
return HttpResponse(status=200)
One more thing, send these task requests with ajax, so your web interface won't be reloaded or anything, so the user can continue using your app after sending the request
I need a scheduler for my next project, and since I'm coding using Django I went for Celery.
What I am looking for is a way for a task to tell Django when it is done, so I can update the database and use SSE to tell the user. All this can be done fairly simple with just putting all the logic into the task. But what do I do when I am planning to have several celery workers?
I found a bunch of info online to cover the single-worker-case, but not many covering the problem if you have more than one worker.
What I thought about was using http callbacks from the workers to the web-server to let it know that the task is done. Looking at celery.task.http looked promising, but didnt do what I needed.
Is the solution to use signals and hook up manual http calls? Or am I on the wrong path? Isn't this a common problem? How can this be solved more elegantly?
So, what are you mean when you tell tell to Django? Is I understand you right, django request which initiliazed a Celery task, is still alive a time when this task is finished? I that case you can check some storage ( database, memcached, etc ). and send your SSE.
Look, there is one way to do that.
1. You django view send task to Celery, after that it goes to infinite loop ( or loop with timeout 60sec?) and waits results in memcached.
Celery gets task executes, and pastes results to memcached.
Django view gets new results, exit the loop and sends your SSE.
Next variant is
Django view sends task to Celery, and returns
Celery execute tasks, after executing it makes simple HTTP requests to your django app.
Django receives a http request from Celery, parse params and send SSE to your user again
Here is some code that seems to do what I want:
In django settings:
CELERY_ANNOTATIONS = {
"*": {
"on_failure": celery_handlers.on_failure,
"on_success": celery_handlers.on_success
}
}
In the celery_handlers.py file included:
def on_failure(self, exc, task_id, *args, **kwargs):
# Use urllib or similar to poke eg; api-int.mysite.com/task_handler/TASK_ID
pass
def on_success(self, retval, task_id, *args, **kwargs):
# Use urllib or similar to poke eg; api-int.mysite.com/task_handler/TASK_ID
pass
And then you can just setup api-int to use something like:
from celery.result import AsyncResult
task_obj = AsyncResult(task_id)
# Logic to handle task_obj.result and related goes here....
Background:
I'm working a project which uses Django with a Postgres database. We're also using mod_wsgi in case that matters, since some of my web searches have made mention of it. On web form submit, the Django view kicks off a job that will take a substantial amount of time (more than the user would want to wait), so we kick off the job via a system call in the background. The job that is now running needs to be able to read and write to the database. Because this job takes so long, we use multiprocessing to run parts of it in parallel.
Problem:
The top level script has a database connection, and when it spawns off child processes, it seems that the parent's connection is available to the children. Then there's an exception about how SET TRANSACTION ISOLATION LEVEL must be called before a query. Research has indicated that this is due to trying to use the same database connection in multiple processes. One thread I found suggested calling connection.close() at the start of the child processes so that Django will automatically create a new connection when it needs one, and therefore each child process will have a unique connection - i.e. not shared. This didn't work for me, as calling connection.close() in the child process caused the parent process to complain that the connection was lost.
Other Findings:
Some stuff I read seemed to indicate you can't really do this, and that multiprocessing, mod_wsgi, and Django don't play well together. That just seems hard to believe I guess.
Some suggested using celery, which might be a long term solution, but I am unable to get celery installed at this time, pending some approval processes, so not an option right now.
Found several references on SO and elsewhere about persistent database connections, which I believe to be a different problem.
Also found references to psycopg2.pool and pgpool and something about bouncer. Admittedly, I didn't understand most of what I was reading on those, but it certainly didn't jump out at me as being what I was looking for.
Current "Work-Around":
For now, I've reverted to just running things serially, and it works, but is slower than I'd like.
Any suggestions as to how I can use multiprocessing to run in parallel? Seems like if I could have the parent and two children all have independent connections to the database, things would be ok, but I can't seem to get that behavior.
Thanks, and sorry for the length!
Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.
My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):
from django import db
db.connections.close_all()
def db_worker():
some_paralell_code()
Process(target = db_worker,args = ())
Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.
Update:
To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.
def db_worker(models_ids):
obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
obj.run()
model_ids = Model.objects.all().values_list('id', flat=True)
model_ids = list(model_ids) # cast to list
process_count = 5
delta = (len(model_ids) / process_count) + 1
# do all the db stuff here ...
# here you can close db connection
from django import db
db.connections.close_all()
for it in range(0:process_count):
Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))
When using multiple databases, you should close all connections.
from django import db
for connection_name in db.connections.databases:
db.connections[connection_name].close()
EDIT
Please use the same as #lechup mentionned to close all connections(not sure since which django version this method was added):
from django import db
db.connections.close_all()
For Python 3 and Django 1.9 this is what worked for me:
import multiprocessing
import django
django.setup() # Must call setup
def db_worker():
for name, info in django.db.connections.databases.items(): # Close the DB connections
django.db.connection.close()
# Execute parallel code here
if __name__ == '__main__':
multiprocessing.Process(target=db_worker)
Note that without the django.setup() I could not get this to work. I am guessing something needs to be initialized again for multiprocessing.
I had "closed connection" issues when running Django test cases sequentially. In addition to the tests, there is also another process intentionally modifying the database during test execution. This process is started in each test case setUp().
A simple fix was to inherit my test classes from TransactionTestCase instead of TestCase. This makes sure that the database was actually written, and the other process has an up-to-date view on the data.
Another way around your issue is to initialise a new connection to the database inside the forked process using:
from django.db import connection
connection.connect()
(not a great solution, but a possible workaround)
if you can't use celery, maybe you could implement your own queueing system, basically adding tasks to some task table and having a regular cron that picks them off and processes? (via a management command)
Hey I ran into this issue and was able to resolve it by performing the following (we are implementing a limited task system)
task.py
from django.db import connection
def as_task(fn):
""" this is a decorator that handles task duties, like setting up loggers, reporting on status...etc """
connection.close() # this is where i kill the database connection VERY IMPORTANT
# This will force django to open a new unique connection, since on linux at least
# Connections do not fare well when forked
#...etc
ScheduledJob.py
from django.db import connection
def run_task(request, job_id):
""" Just a simple view that when hit with a specific job id kicks of said job """
# your logic goes here
# ...
processor = multiprocessing.Queue()
multiprocessing.Process(
target=call_command, # all of our tasks are setup as management commands in django
args=[
job_info.management_command,
],
kwargs= {
'web_processor': processor,
}.items() + vars(options).items()).start()
result = processor.get(timeout=10) # wait to get a response on a successful init
# Result is a tuple of [TRUE|FALSE,<ErrorMessage>]
if not result[0]:
raise Exception(result[1])
else:
# THE VERY VERY IMPORTANT PART HERE, notice that up to this point we haven't touched the db again, but now we absolutely have to call connection.close()
connection.close()
# we do some database accessing here to get the most recently updated job id in the database
Honestly, to prevent race conditions (with multiple simultaneous users) it would be best to call database.close() as quickly as possible after you fork the process. There may still be a chance that another user somewhere down the line totally makes a request to the db before you have a chance to flush the database though.
In all honesty it would likely be safer and smarter to have your fork not call the command directly, but instead call a script on the operating system so that the spawned task runs in its own django shell!
If all you need is I/O parallelism and not processing parallelism, you can avoid this problem by switch your processes to threads. Replace
from multiprocessing import Process
with
from threading import Thread
The Thread object has the same interface as Procsess
If you're also using connection pooling, the following worked for us, forcibly closing the connections after being forked. Before did not seem to help.
from django.db import connections
from django.db.utils import DEFAULT_DB_ALIAS
connections[DEFAULT_DB_ALIAS].dispose()
One possibility is to use multiprocessing spawn child process creation method, which will not copy django's DB connection details to the child processes. The child processes need to bootstrap from scratch, but are free to create/close their own django DB connections.
In calling code:
import multiprocessing
from myworker import work_one_item # <-- Your worker method
...
# Uses connection A
list_of_items = djago_db_call_one()
# 'spawn' starts new python processes
with multiprocessing.get_context('spawn').Pool() as pool:
# work_one_item will create own DB connection
parallel_results = pool.map(work_one_item, list_of_items)
# Continues to use connection A
another_db_call(parallel_results)
In myworker.py:
import django. # <-\
django.setup() # <-- needed if you'll make DB calls in worker
def work_one_item(item):
try:
# This will create a new DB connection
return len(MyDjangoModel.objects.all())
except Exception as ex:
return ex
Note that if you're running the calling code inside a TestCase, mocks will not be propagated to the child processes (will need to re-apply them).
You could give more resources to Postgre, in Debian/Ubuntu you can edit :
nano /etc/postgresql/9.4/main/postgresql.conf
by replacing 9.4 by your postgre version .
Here are some useful lines that should be updated with example values to do so, names speak for themselves :
max_connections=100
shared_buffers = 3000MB
temp_buffers = 800MB
effective_io_concurrency = 300
max_worker_processes = 80
Be careful not to boost too much these parameters as it might lead to errors with Postgre trying to take more ressources than available. Examples above are running fine on a Debian 8GB Ram machine equiped with 4 cores.
Overwrite the thread class and close all DB connections at the end of the thread. Bellow code works for me:
class MyThread(Thread):
def run(self):
super().run()
connections.close_all()
def myasync(function):
def decorator(*args, **kwargs):
t = MyThread(target=function, args=args, kwargs=kwargs)
t.daemon = True
t.start()
return decorator
When you need to call a function asynchronized:
#myasync
def async_function():
...