How to use threading with django and gunicorn - django

I am trying to use the threading library inside of a django application that uses gunicorn. When I run my code locally everything is good, but as soon as I try to call the view from production I get a context error. I believe this is due to gunicorn.
Here is the error
RuntimeError: cannot exit context: thread state references a different context object
Here is my code.
t = threading.Thread(
target=myFunction, args=[arg1]
)
t.setDaemon(True)
t.start()

I'm posting the solution I found as I could not find any reference to this exact issue and resolution. It turns out the issue was not with python or django but rather Gunicorn itself. In order to use threading I had to add the --threads param to the service file.
/usr/bin/gunicorn3 --name=my_app --pythonpath=/home/django/myenv --bind unix:/home/django/myenv/my_app/gunicorn.socket my_app.wsgi:application --workers=4 --threads=2 --worker-class=gthread
I also set the worker class to gthread

Related

Gunicorn reflect changed code dynamically

I am developing a django web application where a user can modify the code of certain classes, in the application itself, through UI using ace editor (think of as gitlab/github where you can change code online). But these classes are ran by django and celery worker at some point.
Once code changes are saved, the changes are not picked by django due to gunicorn but works fine with celery because its different process. (running it locally using runserver works fine and changes are picked by both django and celery).
Is there a way to make gunicorn reflects the changes of certain directory that contain the classes without reloading the whole application? and if reloading is necessary, is there a way to reload gunicorn's workers one-by-one without having any downtime?
the gunicron command:
/usr/local/bin/gunicorn config.wsgi --bind 0.0.0.0:5000 --chdir=/app
The wsgi configuration file:
import os
import sys
from django.core.wsgi import get_wsgi_application
app_path = os.path.abspath(os.path.join(
os.path.dirname(os.path.abspath(__file__)), os.pardir))
sys.path.append(os.path.join(app_path, 'an_application'))
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.production")
application = get_wsgi_application()
The reload option is "intended for development". There's no strong wording saying you shouldn't use it in production. The reason you shouldn't use it in production is because people make typos, change in one file, may need several other changes in others, etc etc. So, you can make your site inaccessible and then you don't have a working app to fix it again.
For a dev, that's no problem as you look at the logs/output in your shell and restart it. This is why #Krzysztof's suggestion is the best one. Push the code changes to your repo, make it go through the CI/CD and switch over the pod. If CI fails, then CD won't happen so you're good.
Of course, that's a scope far too large for a Q&A site.
Why not save the code in a separate text file or database and the relevant method can simply load the code dynamically as a string and execute it using exec()?
Let say you have a function function1 which can be edited by a user. When the user submits the changes, process the input (separate out the functions so that you know which function has what definition), and save them all individually, like function1, function2 etc., in a database or a text file as strings.
One you need to execute function1, just load its value that you saved and use exec to execute the code.
This way, you won't need to reload gunicorn since all workers will always fetch the updated function definition at run time!
Something in the lines of:
def function1_original():
# load function definition
f = open("function1.txt", "r")
# execute the string
exec(f.read()) # this will just load the function definition
function1() # this will execute the user defined function
So the user will define:
def function1():
# user defined code
# blah blah
...
I was able to solve this by changing the extension of the python scripts to anything but .py
Then I loaded these files using the following function:
from importlib import util
from immportlib.machinary import SourceFileLoader
def load_module(module_name, modele_path):
module_path = path.join(path.dirname(__file__), "/path/to/your/files{}.anyextension".format(module_name))
spec = util.spec_from_loader(module_name,
SourceFileLoader(module_name, module_path))
module = util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
In this case, they are not loaded by Gunicorn in RAM and I was able to apply the changes on fly without the need to apply eval or exec functiong.

Google App Engine deferred.defer task not getting executed

I have a Google App Engine Standard Environment application that has been working fine for a year or more, that, quite suddenly, refuses to enqueue new deferred tasks using deferred.defer.
Here's the Python 2.7 code that is making the deferred call:
# Find any inventory items that reference the product, and change them too.
# because this could take some time, we'll do it as a deferred task, and only
# if needed.
if upd:
updater = deferredtasks.InvUpdate()
deferred.defer(updater.run, product_key)
My app.yaml file has the necessary bits to support deferred.defer:
- url: /_ah/queue/deferred
script: google.appengine.ext.deferred.deferred.application
login: admin
builtins:
- deferred: on
And my deferred task has logging in it so I should see it running when it does:
#-------------------------------------------------------------------------------
# DEFERRED routine that updates the inventory items for a particular product. Should be callecd
# when ANY changes are made to the product, because it should trigger a re-download of the
# inventory record for that product to the iPad.
#-------------------------------------------------------------------------------
class InvUpdate(object):
def __init__(self):
self.to_put = []
self.product_key = None
self.updcount = 0
def run(self, product_key, batch_size=100):
updproduct = product_key.get()
if not updproduct:
logging.error("DEFERRED ERROR: Product key passed in does not exist")
return
logging.info(u"DEFERRED BEGIN: beginning inventory update for: {}".format(updproduct.name))
self.product_key = product_key
self._continue(None, batch_size)
...
When I run this in the development environment on my development box, everything works fine. Once I deploy it to the App Engine server, the inventory updates never get done (i.e. the deferred task is not executed), and there are no errors (and no other logging from the deferred task in fact) in the log files on the server. I know that with the sudden move to get everybody on Python 3 as quickly as possible, the deferred.defer library has been marked as not recommended because it only works with the 2.7 Python environment, and I planned on moving to task queues for this, but I wasn't expecting deferred.defer to suddenly stop working in the existing python environment.
Any insight would be greatly appreciated!
I'm pretty sure you cant pass the method of an instance to appengine taskqueue, because that instance will not get exist when your task runs since it will be running in a different process. I actually dont understand how your task ever worked when running remotely in the first place (and running locally is not an accurate representation of how things will run remotely)
Try changing your code to this:
if upd:
deferred.defer(deferredtasks.InvUpdate.run_cls, product_key)
and then InvUpdate is the same but has a new function run_cls:
class InvUpdate(object):
#classmethod
def run_cls(cls, product_key):
cls().run(product_key)
And I'm still on the process of migrating to cloud tasks and my deferred tasks still work

Using requests library in on_failure or on_sucesss hook causes the task to retry indefinitely

This is what I have:
import youtube_dl # in case this matters
class ErrorCatchingTask(Task):
# Request = CustomRequest
def on_failure(self, exc, task_id, args, kwargs, einfo):
# If I comment this out, all is well
r = requests.post(server + "/error_status/")
....
#app.task(base=ErrorCatchingTask, bind=True, ignore_result=True, max_retires=1)
def process(self, param_1, param_2, param_3):
...
raise IndexError
...
The worker will throw exception and then seemingly spawn a new task with a different task id Received task: process[{task_id}
Here are a couple of things I've tried:
Importing from celery.worker.request import Request and overriding on_failure and on_success functions there instead.
app.conf.broker_transport_options = {'visibility_timeout': 99999999999}
#app.task(base=ErrorCatchingTask, bind=True, ignore_result=True, max_retires=1)
Turn off DEBUG mode
Set logging to info
Set CELERY_IGNORE_RESULT to false (Can I use Python requests with celery?)
import requests as apicall to rule out namespace conflict
Money patch requests Celery + Eventlet + non blocking requests
Move ErrorCatchingTask into a separate file
If I don't use any of the hook functions, the worker will just throw the exception and stay idle until the next task is scheduled, which what I expect even when I use the hooks. Is this a bug? I searched through and through on github issues, but couldn't find the same problem. How do you debug a problem like this?
Django 1.11.16
celery 4.2.1
My problem was resolved after I used grequests
In my case, celery worker would reschedule as soon as conn.urlopen() was being called in requests/adapters.py. Another behavior I observed was if I had another worker from another project open in the same machine, sometimes infinite rescheduling would stop. This probably was some locking mechanism that was originally intended for other purpose kicking in.
So this led me to suspect that this is indeed threading issue and after researching whether requests library was thread safe, I found some people suggesting different things.. In theory, monkey patching should have a similar effect as using grequests, but it is not the same, so just use grequests or erequests library instead.
Celery Debugging instruction is here

Flask.socket_io blocking calls when database queries are run

I am trying to use socket_io with my flask application. The problem is when i run database queries, like in the url_route function below. The first time the page loads properly but on consecutive calls the process goes into a blocking state. Even KeyboardInterrupt (Ctrl + c) terminates one of the python processes, i have to manually kill the other one.
One obvious solution would be to use a cache and use another script to run queries on database. Is there any other possible solution which could avoid running separate scripts?
#app.route('/status/<urlMap>')
def status(urlMap):
dictResponse = {}
data = models.Status.query.filter_by(urlmap = urlMap).first()
if data.conversion == "DONE":
dictResponse['conversion'] = 'success'
if data.published == "DONE":
dictResponse['publish'] = 'success'
return render_template('status.html',status = dictResponse)
Also on removing the import flask.ext.socketio and using app.run(host='0.0.0.0') instead of socketio.run(app,host='0.0.0.0') the app runs perfectly. So i think its the async gevent calls thats somehow blocking the process.
Like #Miguel pointed out the problem correctly. monkey patching the standard libraries solved the issue.
monkey.patch_all() solved the problem.

How do I get SQL logging in Rails 4 WEBrick for a non-development environment?

I'm using Rails 4.1.2. I have some environments which are exact copies of my development environment. In other words, I created them by simply copying config/environments/development.rb to a file with a different name (e.g., destaging.rb). They differ only in the connection information in database.yml.
If I issue RAILS_ENV=destaging rails s or rails s -e destaging at the command line, everything works just as I desire, except that I get no SQL logging to STDOUT, which is a bummer.
Since my destaging environment is absolutely identical to my development environment except for different connection settings in database.yml, I suspect that something is looking for an environment named development and enabling SQL logging to STDOUT only if an environment with that name is active. How can I enable SQL logging to STDOUT for other environments launched through WEBRick?
For posterity, I've discovered how to do this. First, I'm running Ruby 2.1.2 with Rails 4.1.2. If that is not your environment, your mileage may vary, though I suspect the solution will be very similar.
So, first you must modify bin/rails. Open this file and change it as follows. (I have posted the entire file, minus the shebang, for clarity.)
begin
load File::expand_path("../spring", __FILE__)
rescue LoadError
end
APP_PATH = File.expand_path('../../config/application', __FILE__)
require_relative '../config/boot'
# Here comes the important part
require 'rails/commands/server'
class Rails::Server::Options
def parse_with_logging!(args)
options = parse_without_logging!(args)
options[:log_stdout] = true # Or whatever condition you want
options
end
alias_method_chain :parse!, :logging
end
require 'rails/commands'
Since require 'rails/commands' executes the server immediately, monkey-patching after that line does not work. It is simply ignored. If you try to monkey-patch it before you require the commands, it explodes because the Rails::Server::Options class has not yet been defined. Thus, we have to pre-emptively require rails/commands/server so we can alias its parse! method.
Monkey-patching should almost always be a last resort, IMHO. However, I see no alternative in this case. If anyone has a better idea, I'd love to hear it.
I also encountered this problem with the same versions of Rails and Ruby, using a non-standard environment name (in your case "destaging"). However I did not want it to affect all environments, nor lose any more time to not getting work done, so I simply changed the way I start the server:
(tail -F log/destaging.log &) && rails s
Then afterwards to restart the server, ctrl-c as usual and then rails s again. The tail will keep going in the background and for all intents and purposes the experience will be like it was before this stopped working.