Specifically, I'm running a Flask app with default workers in gunicorn. I'm trying to figure out how to debug / trace what is happening when a worker is killed due to timeout while serving a request. Is there a way to get a stack trace or profile the request to debug this?
Honestly a very good question, I was not sure if it is even possible to achieve the same or not. But as I started digging I found lot of interesting threads
Showing the stack trace from a running Python application
Get stacktrace from stuck python process
https://github.com/khamidou/lptrace
https://gist.github.com/reywood/e221c4061bbf2eccea885c9b2e4ef496
So first I created a simple flask app with below code
app.py
from flask import Flask
app = Flask(__name__)
import time
def a():
b()
def b():
c ()
def c():
i = 0
while i < 900:
time.sleep(1)
i += 1
#app.route('/', defaults={'path': ''})
#app.route('/<path:path>')
def catch_all(path):
a()
return 'You want path: %s' % path
if __name__ == '__main__':
app.run()
wsgi.py
from app import app
if __name__ == "__main__":
app.run()
Now running the app like below and doing curl localhost:8000/abc
$ gunicorn wsgi:app
[2019-08-01 08:19:06 +0000] [26825] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:19:06 +0000] [26825] [INFO] Listening at: http://127.0.0.1:8000 (26825)
[2019-08-01 08:19:06 +0000] [26825] [INFO] Using worker: sync
[2019-08-01 08:19:06 +0000] [26828] [INFO] Booting worker with pid: 26828
[2019-08-01 08:19:40 +0000] [26825] [CRITICAL] WORKER TIMEOUT (pid:26828)
[2019-08-01 08:19:40 +0000] [26828] [INFO] Worker exiting (pid: 26828)
[2019-08-01 08:19:40 +0000] [26832] [INFO] Booting worker with pid: 26832
Now what we need is a hook which can be called before the worked is killed. gunicorn supports server events in the configuration file
So now we create a config file
gunicorn_config.py
timeout = 3
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
And our output is now
$ gunicorn -c gunicorn_config.py wsgi:app
[2019-08-01 08:22:17 +0000] [26837] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:22:17 +0000] [26837] [INFO] Listening at: http://127.0.0.1:8000 (26837)
[2019-08-01 08:22:17 +0000] [26837] [INFO] Using worker: sync
[2019-08-01 08:22:17 +0000] [26840] [INFO] Booting worker with pid: 26840
[2019-08-01 08:22:22 +0000] [26837] [CRITICAL] WORKER TIMEOUT (pid:26840)
worker is being killed - 26840
[2019-08-01 08:22:22 +0000] [26840] [INFO] Worker exiting (pid: 26840)
[2019-08-01 08:22:22 +0000] [26844] [INFO] Booting worker with pid: 26844
This is good, now we need mix our previous knowledge on pyrasite and this to get the stack. So we updated the config file like below
gunicorn_config.py
timeout = 3
__code_dump_stack__ = """
import sys, traceback
for thread, frame in sys._current_frames().items():
print('Thread 0x%x' % thread)
traceback.print_stack(frame)
print()
"""
def dump_stack_for_process(pid):
import pyrasite
ipc = pyrasite.PyrasiteIPC(pid)
ipc.connect()
print(ipc.cmd(__code_dump_stack__))
ipc.close()
def worker_abort(worker):
pid = worker.pid
print("worker is being killed - {}".format(pid))
dump_stack_for_process(pid)
And now our output is
$ [2019-08-01 08:25:29 +0000] [26848] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:25:29 +0000] [26848] [INFO] Listening at: http://127.0.0.1:8000 (26848)
[2019-08-01 08:25:29 +0000] [26848] [INFO] Using worker: sync
[2019-08-01 08:25:29 +0000] [26851] [INFO] Booting worker with pid: 26851
[2019-08-01 08:25:38 +0000] [26848] [CRITICAL] WORKER TIMEOUT (pid:26851)
worker is being killed - 26851
Thread 0x7ff0a7a4b700
File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "<string>", line 72, in run
File "<string>", line 92, in on_command
File "<string>", line 6, in <module>
Thread 0x7ff0ac512700
File "/home/vagrant/.local/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 203, in run
self.manage_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.run()
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 124, in run
self.run_for_one(timeout)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
self.accept(listener)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 30, in accept
self.handle(listener, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, in handle
self.handle_request(listener, req, client, addr)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2463, in __call__
return self.wsgi_app(environ, start_response)
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/vagrant/remotedebug/app.py", line 20, in catch_all
a()
File "/home/vagrant/remotedebug/app.py", line 6, in a
b()
File "/home/vagrant/remotedebug/app.py", line 9, in b
c ()
File "/home/vagrant/remotedebug/app.py", line 14, in c
time.sleep(1)
File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
self.cfg.worker_abort(self)
File "gunicorn_config.py", line 23, in worker_abort
dump_stack_for_process(pid)
File "gunicorn_config.py", line 17, in dump_stack_for_process
print(ipc.cmd(__code_dump_stack__))
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 161, in cmd
return self.recv()
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 174, in recv
header_data = self.recv_bytes(4)
File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
chunk = self.sock.recv(n - len(data))
[2019-08-01 08:25:38 +0000] [26851] [INFO] Worker exiting (pid: 26851)
[2019-08-01 08:25:38 +0000] [26862] [INFO] Booting worker with pid: 26862
The stack trace is large but it gives us what we need
Just wanted to add to the accepted answer that there's, also, another way to get similar results by using the built-in Python 3 module faulthandler introduced in v3.3 https://docs.python.org/dev/library/faulthandler.html
You can add the following two lines to wsgi.py mentioned in the accepted answer:
import faulthandler
faulthandler.enable()
When timeout occurs the following output will be printed to stderr:
[2020-12-24 13:38:32 +0000] [31304] [INFO] Starting gunicorn 20.0.4
[2020-12-24 13:38:32 +0000] [31304] [INFO] Listening at: http://0.0.0.0:8888 (31304)
[2020-12-24 13:38:32 +0000] [31304] [INFO] Using worker: sync
[2020-12-24 13:38:32 +0000] [31307] [INFO] Booting worker with pid: 31307
[2020-12-24 13:38:55 +0000] [31304] [CRITICAL] WORKER TIMEOUT (pid:31307)
Fatal Python error: Aborted
Current thread 0x00007f411d781700 (most recent call first):
File "/tmp/app.py", line 14 in c
File "/tmp/app.py", line 9 in b
File "/tmp/app.py", line 6 in a
File "/tmp/app.py", line 20 in catch_all
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1936 in dispatch_request
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1950 in full_dispatch_request
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2447 in wsgi_app
File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2464 in __call__
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175 in handle_request
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134 in handle
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 29 in accept
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 67 in run_for_one
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 123 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/base.py", line 140 in init_process
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583 in spawn_worker
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616 in spawn_workers
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545 in manage_workers
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 202 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 72 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 228 in run
File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 58 in run
File "/tmp/venv/bin/gunicorn", line 8 in <module>
[2020-12-24 13:38:55 +0000] [31307] [INFO] Worker exiting (pid: 31307)
[2020-12-24 13:38:55 +0000] [31503] [INFO] Booting worker with pid: 31503
This is the error I receive while I try to deploy using gcloud app deploy. I have previously successfully deployed the same app. I am able to run the app in local machine, but receives the error on deploy
the traceback:
Updating service [default]...failed.
ERROR: (gcloud.app.deploy) Error Response: [9]
Application startup error:
[2017-08-25 10:50:23 +0000] [1] [INFO] Starting gunicorn 19.7.1
[2017-08-25 10:50:23 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2017-08-25 10:50:23 +0000] [1] [INFO] Using worker: sync
[2017-08-25 10:50:23 +0000] [7] [INFO] Booting worker with pid: 7
[2017-08-25 10:50:23 +0000] [7] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/env/lib/python3.5/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
worker.init_process()
File "/env/lib/python3.5/site-packages/gunicorn/workers/base.py", line 126, in init_process
self.load_wsgi()
File "/env/lib/python3.5/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
self.wsgi = self.app.wsgi()
File "/env/lib/python3.5/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/env/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
return self.load_wsgiapp()
File "/env/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
return util.import_app(self.app_uri)
File "/env/lib/python3.5/site-packages/gunicorn/util.py", line 352, in import_app
__import__(module)
File "/home/vmagent/app/main.py", line 19, in <module>
app = bookshelf.create_app(config)
File "/home/vmagent/app/bookshelf/__init__.py", line 49, in create_app
model = get_model()
File "/home/vmagent/app/bookshelf/__init__.py", line 107, in get_model
from . import model_datastore
File "/home/vmagent/app/bookshelf/model_datastore.py", line 16, in <module>
from google.cloud import datastore
File "/env/lib/python3.5/site-packages/google/cloud/datastore/__init__.py", line 61, in <module>
from google.cloud.datastore.client import Client
File "/env/lib/python3.5/site-packages/google/cloud/datastore/client.py", line 33, in <module>
from google.cloud.datastore.query import Query
File "/env/lib/python3.5/site-packages/google/cloud/datastore/query.py", line 19, in <module>
from google.api.core import page_iterator
ImportError: No module named 'google.api.core'
[2017-08-25 10:50:23 +0000] [7] [INFO] Worker exiting (pid: 7)
[2017-08-25 10:50:24 +0000] [1] [INFO] Shutting down: Master
[2017-08-25 10:50:24 +0000] [1] [INFO] Reason: Worker failed to boot.
tl;dr: Upgrade your google-cloud to 0.27, and it should fix things.
I believe this is a bug with the new google-cloud dependencies. In my case, google-cloud==0.25 was pulling in these dependencies in its setup.py:
'google-cloud-core >= 0.24.0, < 0.25dev',
'google-cloud-datastore >= 1.0.0, < 2.0dev',
Just recently on 8/24 (a day before this issue was filed), the google-cloud-datastore package was updated to 1.3.0.
Unfortunately, google-cloud-datastore 1.3.0 is depending on a newer version of google-cloud-core:
'google-cloud-core >= 0.27.0, < 0.28dev',
But it seems this versioning conflict is unresolved/unwarned by pip, which uses the older version. But google-cloud-datastore wants to from google.api.core import page_iterator, even though google.api.core, which wasn't added until 0.27.0, and then everything breaks.
I believe the "bug" is in the overload broad dependency in google-cloud===0.25 (or possibly whatever version you are using).
I believe the "fix" for us is to upgrade to the latest version of google-cloud=0.27.
Though the "proper fix" is for google-cloud to improve their versioning dependencies, and not be so broad, or risk breaking backwards compatibility with already-published modules.
I've installed Sentry into a virtualenv located at /www/sentry/ and I have a config file /www/sentry/sentry.conf.py. I am able to run the following commands successfully:
$ sentry --config=/www/sentry/sentry.conf.py celery worker -B
$ sentry --config=/www/sentry/sentry.conf.py upgrade
I can even run sentry --config=/www/sentry/sentry.conf.py shell and then once in the Django shell, check that the settings module imported from django.conf has the custom settings I added in my sentry.conf.py file.
However, when I try to spin up the included Gunicorn server I get the following:
$ sentry --config=/www/sentry/sentry.conf.py start
Performing upgrade before service startup...
Loading help page organizations.md
Loading help page sampling.md
Loading help page tagging.md
Loading help page quotas.md
Loading help page teams_and_projects.md
Running service: 'http'
[2015-02-20 19:47:01 +0000] [19199] [INFO] Starting gunicorn 19.2.1
[2015-02-20 19:47:01 +0000] [19199] [INFO] Listening at: http://0.0.0.0:9000 (19199)
[2015-02-20 19:47:01 +0000] [19199] [INFO] Using worker: sync
[2015-02-20 19:47:01 +0000] [19219] [INFO] Booting worker with pid: 19219
[2015-02-20 19:47:01 +0000] [19219] [ERROR] Exception in worker process:
Traceback (most recent call last):
File "/www/sentry/lib/python2.7/site-packages/gunicorn/arbiter.py", line 503, in spawn_worker
worker.init_process()
File "/www/sentry/lib/python2.7/site-packages/gunicorn/workers/base.py", line 116, in init_process
self.wsgi = self.app.wsgi()
File "/www/sentry/lib/python2.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/www/sentry/lib/python2.7/site-packages/sentry/services/http.py", line 34, in load
import sentry.wsgi
File "/www/sentry/lib/python2.7/site-packages/sentry/wsgi.py", line 20, in <module>
configure()
File "/www/sentry/lib/python2.7/site-packages/sentry/utils/runner.py", line 399, in configure
initializer=initialize_app,
File "/www/sentry/lib/python2.7/site-packages/logan/runner.py", line 89, in configure_app
raise ValueError("Configuration file does not exist at %r" % (config_path,))
ValueError: Configuration file does not exist at '/www/.sentry/sentry.conf.py'
...etc...
I tried creating a /www/.sentry/ directory, and then copying my config file into it and then the server loads without a problem:
$ mkdir /www/.sentry/
$ cp /www/sentry/sentry.conf.py /www/.sentry/sentry.conf.py
$ sentry --config=/www/sentry/sentry.conf.py start
Performing upgrade before service startup...
Loading help page organizations.md
Loading help page sampling.md
Loading help page tagging.md
Loading help page quotas.md
Loading help page teams_and_projects.md
Running service: 'http'
[2015-02-20 19:50:12 +0000] [19653] [INFO] Starting gunicorn 19.2.1
[2015-02-20 19:50:12 +0000] [19653] [INFO] Listening at: http://0.0.0.0:9000 (19653)
[2015-02-20 19:50:12 +0000] [19653] [INFO] Using worker: sync
[2015-02-20 19:50:12 +0000] [19673] [INFO] Booting worker with pid: 19673
[2015-02-20 19:50:12 +0000] [19674] [INFO] Booting worker with pid: 19674
[2015-02-20 19:50:12 +0000] [19675] [INFO] Booting worker with pid: 19675
This seems silly and unnecessary however. Can anyone point me in the correct direction?
I had the same issue, I found this bug report:
https://github.com/getsentry/sentry/issues/1438
This commit fixed it for me: https://github.com/getsentry/sentry/commit/7629de1102973e4a3930487a3bf126a2f13c6850
I have a locally running Django App which I'm trying to move into Heroku. I'm following and adapting these instructions to do so. However, when I push my code to Heroku, the app is unavailable via Web Browser. When I tail the logs, I see many lines such as these:
2014-09-25T17:02:21.927991+00:00 app[web.1]: 2014-09-25 17:02:21 [2] [CRITICAL] WORKER TIMEOUT (pid:10)
2014-09-25T17:02:23.050632+00:00 app[web.1]: 2014-09-25 17:02:23 [11] [INFO] Booting worker with pid: 11
2014-09-25T17:02:54.021864+00:00 app[web.1]: 2014-09-25 17:02:54 [2] [CRITICAL] WORKER TIMEOUT (pid:11)
2014-09-25T17:02:55.039070+00:00 app[web.1]: 2014-09-25 17:02:55 [12] [INFO] Booting worker with pid: 12
2014-09-25T17:03:26.061225+00:00 app[web.1]: 2014-09-25 17:03:26 [2] [CRITICAL] WORKER TIMEOUT (pid:12)
2014-09-25T17:03:27.081593+00:00 app[web.1]: 2014-09-25 17:03:27 [13] [INFO] Booting worker with pid: 13
Why am I getting this indecipherable error and how can I fix it?
What other things should I look at to troubleshoot this?
Here is the contents of my Procfile:
web: gunicorn MyProject.wsgi --log-file -
EDIT: FYI: When I run this same app locally using foreman start, it works.
I'm trying to deploy a django app to heroku and it keeps crashing. Does anyone have any idea what I'm doing wrong?
Here is my Procfile:
web: python app/manage.py collectstatic --noinput; gunicorn --workers=4 --bind=0.0.0.0:$PORT app.settings
And here is a snippet from my heroku logs:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python2.7/site-packages/gunicorn/workers/base.py", line 100, in init_process
File "/app/.heroku/python/lib/python2.7/site-packages/gunicorn/arbiter.py", line 456, in spawn_worker
self.wsgi = self.app.wsgi()
app = eval(obj, mod.__dict__)
File "<string>", line 1, in <module>
File "/app/.heroku/python/lib/python2.7/site-packages/gunicorn/app/base.py", line 101, in wsgi
File "/app/.heroku/python/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 24, in load
self.callable = self.load()
return util.import_app(self.app_uri)
NameError: name 'application' is not defined
2014-07-04 17:58:23 [4] [INFO] Starting gunicorn 0.13.4
2014-07-04 17:58:23 [10] [ERROR] Exception in worker process:
worker.init_process()
File "/app/.heroku/python/lib/python2.7/site-packages/gunicorn/util.py", line 250, in import_app
2014-07-04 17:58:23 [9] [INFO] Booting worker with pid: 9
2014-07-04 17:58:23 [10] [INFO] Worker exiting (pid: 10)
2014-07-04 17:58:23 [4] [INFO] Listening at: http://0.0.0.0:36148 (4)
2014-07-04 17:58:23 [7] [INFO] Booting worker with pid: 7
It says application not found. well try changing gunicorn
web: gunicorn YourProject.wsgi
I solved my own issue after doing several things, including upgrading my django install and changing my procfile to this:
web: gunicorn app.wsgi --pythonpath app --log-file -