Django, use redis-py to lock in django views - django

I'm trying to use redis to lock some of the big management Postgresql transaction I have in my project.
I haven't been successful so far on my development environment.
A simple version of the code would look like that:
def test_view(request):
connec = redis.Redis(unix_socket_path='/tmp/vgbet_redis.sock')
if not connec.setnx('test', ''):
print 'Locked'
else:
time.sleep(5) #Slow transaction
connec.delete('test')
print 'Unlocked'
return render_to_response("test.html")
If I open two tabs of that view, the first one print Unlocked after 5 seconds, then the second one prints Unlocked after 10 seconds. It looks like they are synchronous which doesn't make any sense to me.
Edit:
I have tried to install an apache and a gevent and I got the exact same results.
So I guess there is really something I don't understand with django + redis and my code is really wrong.
Any help would be great.
Edit2:
I just tried with django-redis by using redis as a cache.
CACHES = {
'default': {
'BACKEND': 'redis_cache.RedisCache',
'LOCATION': '/tmp/vgbet_redis.sock',
'OPTIONS': {
'DB': 1,
'PASSWORD': None,
'PARSER_CLASS': 'redis.connection.HiredisParser'
},
},
}
And I still have the same result if I open two tabs in my browser. The second view is blocked for 5 seconds, as if everything is synchronous.
from django.core.cache import cache
def test_view(request):
if cache.get('test') != None:
print 'Locked'
else:
cache.set('test', '', 60)
time.sleep(5) #Slow transaction
cache.delete('test')
return render_to_response("test.html")
If I open two terminals, I have no issue reading and writing in redis. So I really don't understand why I'm not able to use the cache in views.

A couple things to check:
The default Django development server is single-threaded, so can only handle one request at a time. The simplest way to test this would be to run the development server twice on different ports (ex, ./manage.py runserver 8080 and ./manage.py runserver 8081).
If you are using an SQL database at all, it might be blocking on a transaction. Are these the exact views you are using? Or are you doing anything with models?
You mentioned using gevent — were you sure to call from gevent import monkey; monkey.patch_all() to monkey patch everything? Also, how are you running your server with gevent?

The main reason for my issue, was because I was using two tabs on my browser. If I use two browsers or two different IP, my code works asynchronously (with gevent and apache, not with runserver but this isn't a surprise).
I think there is something like: if a unique session asks for the same view multiple times, they are served synchronously. I don't know if it's due to the server or django. I can't find anything like that on the documentation. If anyone knows, I would be really interested to understand that last part.

Related

What is a better approach to cache data in django app when running on gunicorn

I have a django application that is running on gunicorn app server. Gunicorn running N wokers. We know that each worker is a separate python process. In one of my applications services i have a query to a database that takes long time (10-20 sec).
Ok i decided to cache the result, so just simply add django.core.cache and call cache.get(key) if result is not None app returned data from cache if not it called service and stored data in cache using cache.set(key, data, time). But if we have a N workers and first query was addressed to worker 0 app stored result of long running service in cache (process memory) of worker 0 but when similar (request contains paging option, but i store whole RawDataSet in memory, so every page returns fast) request is addressed to worker 1 cache as expected wouldn't work because this is a different process. So obviously i have to use some cache that could be used by all workers.
What approach (i.e. use in memory database or something different) is better to use to solve this issue?
I would use REDIS (In Memory Database) to store the SQL results and share them among N workers.
I solved this using django-redis package, the main advantage of this solution is that you have not to change code and still use cache.get() and cache.set() function from django.core.cache you just have to add redis specific cache settings to setting file like this:
CACHES = {
'default': {
'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://127.0.0.1:6379/12',
'OPTIONS': {
'CLIENT_CLASS': 'django_redis.client.DefaultClient'
},
'KEY_PREFIX': 'text_analyzer'
}
}

Shutting down a plotly-dash server

This is a follow-up to this question: How to stop flask application without using ctrl-c . The problem is that I didn't understand some of the terminology in the accepted answer since I'm totally new to this.
import dash
import dash_core_components as dcc
import dash_html_components as html
app = dash.Dash()
app.layout = html.Div(children=[
html.H1(children='Dash Tutorials'),
dcc.Graph()
])
if __name__ == '__main__':
app.run_server(debug=True)
How do I shut this down? My end goal is to run a plotly dashboard on a remote machine, but I'm testing it out on my local machine first.
I guess I'm supposed to "expose an endpoint" (have no idea what that means) via:
from flask import request
def shutdown_server():
func = request.environ.get('werkzeug.server.shutdown')
if func is None:
raise RuntimeError('Not running with the Werkzeug Server')
func()
#app.route('/shutdown', methods=['POST'])
def shutdown():
shutdown_server()
return 'Server shutting down...'
Where do I include the above code? Is it supposed to be included in the first block of code that I showed (i.e. the code that contains app.run_server command)? Is it supposed to be separate? And then what are the exact steps I need to take to shut down the server when I want?
Finally, are the steps to shut down the server the same whether I run the server on a local or remote machine?
Would really appreciate help!
The method in the linked answer, werkzeug.server.shutdown, only works with the development server. Creating a view function, with an assigned URL ("exposing an endpoint") to implement this shutdown function is a convenience thing, which won't work when deployed with a WSGI server like gunicorn.
Maybe that creates more questions than it answers:
I suggest familiarising yourself with Flask's wsgi-standalone deployment docs.
And then probably the gunicorn deployment guide. The monitoring section has a number of different examples of service monitors, which you can use with gunicorn allowing you to run the app in the background, start on reboot, etc.
Ultimately, starting and stopping the WSGI server is the responsibility of the service monitor and logic to do this probably shouldn't be coded into your app.
What works in both cases of
app.run_server(debug=True)
and
app.run_server(debug=False)
anywhere in the code is:
os.kill(os.getpid(), signal.SIGTERM)
(don't forget to import os and signal)
SIGTERM should cause a clean exit of the application.

sqlite database table is locked on tests

I am trying to migrate an application from django 1.11.1 to django 2.0.1
Tests are set up to run with sqlite in memory database. But every test is failing, because sqlite3.OperationalError: database table is locked for every table. How can I find out why is it locked? Icreasing timeout setting does not help.
I am using LiveServerTestCase, so I suppose the tests must be running in a different thread than the in memory database, and it for some reason does not get shared.
I hit this, too. The LiveServerTestCase is multi-threaded since this got merged.
It becomes a problem for me when my app under test issues multiple requests. Then, so my speculation, the LiveServer spawns threads to handle those requests. Those requests then cause a write to the SQLite db. That in turn does not like multiple writing threads.
Funnily enough, runserver knows about --nothreading. But such an option seems to be missing for the test server.
The following snippet brought me a single-threaded test server:
class LiveServerSingleThread(LiveServerThread):
"""Runs a single threaded server rather than multi threaded. Reverts https://github.com/django/django/pull/7832"""
def _create_server(self):
"""
the keep-alive fixes introduced in Django 2.1.4 (934acf1126995f6e6ccba5947ec8f7561633c27f)
cause problems when serving the static files in a stream.
We disable the helper handle method that calls handle_one_request multiple times.
"""
QuietWSGIRequestHandler.handle = QuietWSGIRequestHandler.handle_one_request
return WSGIServer((self.host, self.port), QuietWSGIRequestHandler, allow_reuse_address=False)
class LiveServerSingleThreadedTestCase(LiveServerTestCase):
"A thin sub-class which only sets the single-threaded server as a class"
server_thread_class = LiveServerSingleThread
Then, derive your test class from LiveServerSingleThreadedTestCase instead of LiveServerTestCase.
It was caused by this django bug.
Using a file-based database during testing fixes the "table is locked" error. To make Django use a file-based database, specify it's filename as test database name:
DATABASES = {
'default': {
...
'TEST': {
'NAME': os.path.join(BASE_DIR, 'db.sqlite3.test'),
},
}
}
I suppose that the timeout setting is ignored in case of in-memory database, see this comment for additional info.

Django / Memcached error: The request's session was deleted before the request completed

Here is the full error: The request's session was deleted before the request completed. The user may have logged out in a concurrent request, for example.
I am using python-memcached with my sessions using my cache. Every few days I get one of these errors. Its thrown by an UpdateError on request.session.save(). It comes from line 60 in sessions/middleware.py. 99% of the time everything works normally. I have seen this error at many different URLs for GET and POST requests. Users report that they are not clicking the logout button. They are also reporting that this happens 5 minutes after logging in, so their sessions are not expiring. I have 0 evictions on my cache for over a month it has been running. If I Google this error, it looks like no one has ever gotten it before.
I think the connections to memcached might be closing for some reason. Its running on localhost. The only other time I saw this error is when I set my cache config to a server that had memcached running but it was not listening on that interface. That would generate this exact exception on every request. So is there some way that memcache is refusing to listen for a second or two or dropping connections?
Here are my settings:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1:11211',
'TIMEOUT': 1209600, # Two weeks
},
}
SESSION_SAVE_EVERY_REQUEST = True
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_COOKIE_SECURE = CSRF_COOKIE_SECURE = True
SESSION_COOKIE_AGE = 60 * 90 # In 90 minutes
It seems the sure way to cause this error is to run cache.delete with the session key in a shell while the request is running. So something is deleting cache keys. I don't know if its Django or Memcached. Memcached does say STAT evictions 0.
I made this middleware to solve the issue. It seems to have taken care of it. Also check your file descriptor limits.
class SLSessionMiddleware(SessionMiddleware):
"""
Fixes a bug where sessions sometime fail to be set. Catches the error 10 times and gives up.
"""
def process_response(self, request, response):
last_exception = None
for i in range(10):
try:
return super().process_response(request, response)
except Exception as e:
request.session.cycle_key()
time.sleep(1)
last_exception = e
raise last_exception
I also experienced this error when I enabled Django's DummyCache to prevent the caching of my views whilst developing (please note I use Redis as my caching backend).
Be sure to disable the DummyCache when trying to access your site's admin.
The Django documentation gives a hint as to why you are seeing the error:
Finally, Django comes with a “dummy” cache that doesn’t actually cache – it just implements the cache interface without doing anything.
I alternative between these two lines in my settings.py file depending on what I am working on.
"BACKEND": 'django.core.cache.backends.dummy.DummyCache' if DEBUG else "django_redis.cache.RedisCache",
"BACKEND": "django_redis.cache.RedisCache",
In my case I use SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'.
I see this error with next accesses when I renew data from backup (loaddata or so).
So together with such action it is neccessary to delete corresponding entries in cache:
cache.del('django.contrib.sessions.cached_dbo27db603b30jabewi7zkwd78b05zq0vf'). (Or you can delete from redis client or so, of course.)

Django multiprocessing and database connections

Background:
I'm working a project which uses Django with a Postgres database. We're also using mod_wsgi in case that matters, since some of my web searches have made mention of it. On web form submit, the Django view kicks off a job that will take a substantial amount of time (more than the user would want to wait), so we kick off the job via a system call in the background. The job that is now running needs to be able to read and write to the database. Because this job takes so long, we use multiprocessing to run parts of it in parallel.
Problem:
The top level script has a database connection, and when it spawns off child processes, it seems that the parent's connection is available to the children. Then there's an exception about how SET TRANSACTION ISOLATION LEVEL must be called before a query. Research has indicated that this is due to trying to use the same database connection in multiple processes. One thread I found suggested calling connection.close() at the start of the child processes so that Django will automatically create a new connection when it needs one, and therefore each child process will have a unique connection - i.e. not shared. This didn't work for me, as calling connection.close() in the child process caused the parent process to complain that the connection was lost.
Other Findings:
Some stuff I read seemed to indicate you can't really do this, and that multiprocessing, mod_wsgi, and Django don't play well together. That just seems hard to believe I guess.
Some suggested using celery, which might be a long term solution, but I am unable to get celery installed at this time, pending some approval processes, so not an option right now.
Found several references on SO and elsewhere about persistent database connections, which I believe to be a different problem.
Also found references to psycopg2.pool and pgpool and something about bouncer. Admittedly, I didn't understand most of what I was reading on those, but it certainly didn't jump out at me as being what I was looking for.
Current "Work-Around":
For now, I've reverted to just running things serially, and it works, but is slower than I'd like.
Any suggestions as to how I can use multiprocessing to run in parallel? Seems like if I could have the parent and two children all have independent connections to the database, things would be ok, but I can't seem to get that behavior.
Thanks, and sorry for the length!
Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.
My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):
from django import db
db.connections.close_all()
def db_worker():
some_paralell_code()
Process(target = db_worker,args = ())
Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.
Update:
To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.
def db_worker(models_ids):
obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
obj.run()
model_ids = Model.objects.all().values_list('id', flat=True)
model_ids = list(model_ids) # cast to list
process_count = 5
delta = (len(model_ids) / process_count) + 1
# do all the db stuff here ...
# here you can close db connection
from django import db
db.connections.close_all()
for it in range(0:process_count):
Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))
When using multiple databases, you should close all connections.
from django import db
for connection_name in db.connections.databases:
db.connections[connection_name].close()
EDIT
Please use the same as #lechup mentionned to close all connections(not sure since which django version this method was added):
from django import db
db.connections.close_all()
For Python 3 and Django 1.9 this is what worked for me:
import multiprocessing
import django
django.setup() # Must call setup
def db_worker():
for name, info in django.db.connections.databases.items(): # Close the DB connections
django.db.connection.close()
# Execute parallel code here
if __name__ == '__main__':
multiprocessing.Process(target=db_worker)
Note that without the django.setup() I could not get this to work. I am guessing something needs to be initialized again for multiprocessing.
I had "closed connection" issues when running Django test cases sequentially. In addition to the tests, there is also another process intentionally modifying the database during test execution. This process is started in each test case setUp().
A simple fix was to inherit my test classes from TransactionTestCase instead of TestCase. This makes sure that the database was actually written, and the other process has an up-to-date view on the data.
Another way around your issue is to initialise a new connection to the database inside the forked process using:
from django.db import connection
connection.connect()
(not a great solution, but a possible workaround)
if you can't use celery, maybe you could implement your own queueing system, basically adding tasks to some task table and having a regular cron that picks them off and processes? (via a management command)
Hey I ran into this issue and was able to resolve it by performing the following (we are implementing a limited task system)
task.py
from django.db import connection
def as_task(fn):
""" this is a decorator that handles task duties, like setting up loggers, reporting on status...etc """
connection.close() # this is where i kill the database connection VERY IMPORTANT
# This will force django to open a new unique connection, since on linux at least
# Connections do not fare well when forked
#...etc
ScheduledJob.py
from django.db import connection
def run_task(request, job_id):
""" Just a simple view that when hit with a specific job id kicks of said job """
# your logic goes here
# ...
processor = multiprocessing.Queue()
multiprocessing.Process(
target=call_command, # all of our tasks are setup as management commands in django
args=[
job_info.management_command,
],
kwargs= {
'web_processor': processor,
}.items() + vars(options).items()).start()
result = processor.get(timeout=10) # wait to get a response on a successful init
# Result is a tuple of [TRUE|FALSE,<ErrorMessage>]
if not result[0]:
raise Exception(result[1])
else:
# THE VERY VERY IMPORTANT PART HERE, notice that up to this point we haven't touched the db again, but now we absolutely have to call connection.close()
connection.close()
# we do some database accessing here to get the most recently updated job id in the database
Honestly, to prevent race conditions (with multiple simultaneous users) it would be best to call database.close() as quickly as possible after you fork the process. There may still be a chance that another user somewhere down the line totally makes a request to the db before you have a chance to flush the database though.
In all honesty it would likely be safer and smarter to have your fork not call the command directly, but instead call a script on the operating system so that the spawned task runs in its own django shell!
If all you need is I/O parallelism and not processing parallelism, you can avoid this problem by switch your processes to threads. Replace
from multiprocessing import Process
with
from threading import Thread
The Thread object has the same interface as Procsess
If you're also using connection pooling, the following worked for us, forcibly closing the connections after being forked. Before did not seem to help.
from django.db import connections
from django.db.utils import DEFAULT_DB_ALIAS
connections[DEFAULT_DB_ALIAS].dispose()
One possibility is to use multiprocessing spawn child process creation method, which will not copy django's DB connection details to the child processes. The child processes need to bootstrap from scratch, but are free to create/close their own django DB connections.
In calling code:
import multiprocessing
from myworker import work_one_item # <-- Your worker method
...
# Uses connection A
list_of_items = djago_db_call_one()
# 'spawn' starts new python processes
with multiprocessing.get_context('spawn').Pool() as pool:
# work_one_item will create own DB connection
parallel_results = pool.map(work_one_item, list_of_items)
# Continues to use connection A
another_db_call(parallel_results)
In myworker.py:
import django. # <-\
django.setup() # <-- needed if you'll make DB calls in worker
def work_one_item(item):
try:
# This will create a new DB connection
return len(MyDjangoModel.objects.all())
except Exception as ex:
return ex
Note that if you're running the calling code inside a TestCase, mocks will not be propagated to the child processes (will need to re-apply them).
You could give more resources to Postgre, in Debian/Ubuntu you can edit :
nano /etc/postgresql/9.4/main/postgresql.conf
by replacing 9.4 by your postgre version .
Here are some useful lines that should be updated with example values to do so, names speak for themselves :
max_connections=100
shared_buffers = 3000MB
temp_buffers = 800MB
effective_io_concurrency = 300
max_worker_processes = 80
Be careful not to boost too much these parameters as it might lead to errors with Postgre trying to take more ressources than available. Examples above are running fine on a Debian 8GB Ram machine equiped with 4 cores.
Overwrite the thread class and close all DB connections at the end of the thread. Bellow code works for me:
class MyThread(Thread):
def run(self):
super().run()
connections.close_all()
def myasync(function):
def decorator(*args, **kwargs):
t = MyThread(target=function, args=args, kwargs=kwargs)
t.daemon = True
t.start()
return decorator
When you need to call a function asynchronized:
#myasync
def async_function():
...