I'm getting 502s when I try to access any of the views that rely on auth (so /admin/login/, posting to my own /login/ page, etc). It's not occurring on any other views/requests.
Here's the nginx acess log:
GET /admin/login/ HTTP/1.1" 502 182 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Not much that's going to help me there. Here's an extract from the gunicorn log (the first line is a worker booting from when the last one died):
[2018-01-15 19:44:43 +0000] [4775] [INFO] Booting worker with pid: 4775
[2018-01-15 19:46:10 +0000] [4679] [CRITICAL] WORKER TIMEOUT (pid:4775)
[2018-01-15 19:46:10 +0000] [4775] [INFO] Worker exiting (pid: 4775)
What's causing me to lose workers and get 502s?
Edit: I'm using django 2.0.1, django-axes 4.0.1. I'm pretty sure this is an axes issue, but I don't know how to diagnose it.
Thanks to #kichik I enabled debug logging, and discovered that the views were throwing a "WSGIRequest has no attribute 'user'" exception due to me using middleware settings in pre-django2 format. This answer solved the issue.
Related
I am running a Django site with Gunicorn inside a docker container.
Requests are forwarded to this container by nginx, which is running non-dockerized as a regular Ubuntu service.
My site sometimes comes under heavy DDoS attacks that cannot be prevented. I have implemented a number of measures, including Cloudflare, nginx rate limits, gunicorn's own rate limits, and fail2ban as well. Ultimately, these attacks manage to get through due to the sheer number of IP addresses that appear to be in the botnet.
I'm not running anything super-critical, and I will later be looking into load balancing and other options. However, my main issue is that the DDoS attacks do not just take down my site - it's that the site doesn't restore availability when the attack is over.
Somehow, the sheer number of requests is breaking something, and I cannot figure it out. The only way to bring the site back is to restart the container. Nginx service is running just fine, and shows the following in the error logs every time:
2022/08/02 18:03:07 [error] 2115246#2115246: *72 connect() failed (111: Connection refused) while connecting to upstream, client: 172.104.109.161, server: examplesite.com, request: "GET / HTTP/2.0", upstream: "http://127.0.0.1:8000/", host: "examplesite.com"
From this, I thought that somehow the DDoS was crashing the docker container with gunicorn and the django app. Hence, I implemented a health check in the Dockerfile:
HEALTHCHECK --interval=60s --timeout=5s --start-period=5s --retries=3 \
CMD curl -I --fail http://localhost:8000/ || exit 1
I used Docker Autoheal to monitor the health of the container, however the container never turns "unhealthy". Manually running the command curl http://localhost:8000/ returns the website's home page, hence why the container is never turning unhealthy.
Despite this, the container does not appear to be accepting any more requests from nginx, as this is the only output from gunicorn (indicating that it receives the healthcheck curl, but nothing else):
172.17.0.1 - - [02/Aug/2022:15:34:49 +0000] "GET / HTTP/1.0" 403 135 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3599.0 Safari/537.36"
[2022-08-02 15:34:49 +0000] [1344] [INFO] Autorestarting worker after current request.
[2022-08-02 15:34:49 +0000] [1344] [INFO] Worker exiting (pid: 1344)
172.17.0.1 - - [02/Aug/2022:15:34:49 +0000] "GET / HTTP/1.0" 403 135 "-" "Mozilla/5.0 (iPad; CPU OS 8_1_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B435 Safari/600.1.4"
[2022-08-02 15:34:50 +0000] [1447] [INFO] Booting worker with pid: 1447
[2022-08-02 15:34:50 +0000] [1448] [INFO] Booting worker with pid: 1448
[2022-08-02 15:34:51 +0000] [1449] [INFO] Booting worker with pid: 1449
127.0.0.1 - - [02/Aug/2022:15:35:31 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
127.0.0.1 - - [02/Aug/2022:15:36:31 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
127.0.0.1 - - [02/Aug/2022:15:37:31 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
127.0.0.1 - - [02/Aug/2022:15:51:33 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
[2022-08-02 15:51:54 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:1449)
[2022-08-02 15:51:54 +0000] [1449] [INFO] Worker exiting (pid: 1449)
127.0.0.1 - - [02/Aug/2022:15:52:33 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
127.0.0.1 - - [02/Aug/2022:15:53:34 +0000] "HEAD / HTTP/1.1" 200 87301 "-" "curl/7.74.0"
As you can see, no more non-curl requests are received by gunicorn after 15:34:49. Nginx continues to show the upstream connection refused error. What can I do about this? Manually restarting the docker container is simply not feasible - the health check should work, but for some reason the site still works internally, but the docker container is not receiving the outside requests from nginx.
I've tried varying the gunicorn workers and number of requests per worker, but nothing works. The site works perfectly fine normally, I am just completely stuck on where the ddos is breaking something. From my observation, nginx is functioning fine, and the issue is somewhere with the dockerised gunicorn instance, but I don't know how given it responds to internal curl commands perfectly fine - if it was broken, the health check wouldn't be able to access the site!
Edit, extract of my nginx config:
server {
listen 443 ssl http2;
server_name examplesite.com www.examplesite.com;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
ssl_client_certificate /etc/ssl/cloudflare.pem;
ssl_verify_client on;
client_body_timeout 5s;
client_header_timeout 5s;
location / {
limit_conn limitzone 15;
limit_req zone=one burst=10 nodelay;
proxy_pass http://127.0.0.1:8000/;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect http://127.0.0.1:8000/ https://examplesite.com;
}
}
Update: No unreasonable use of system resources from the container, still unsure where in the pipeline it's breaking
I have been working on a side project using django and django rest framework. There is even an active version running in the cloud currently without any issue. I wanted to make some minor changes on the code base but while using the django admin page it crashes silently. I haven't changed anything in the code until now.
[2022-08-31 13:39:10 +0200] [8801] [INFO] Starting gunicorn 20.1.0
[2022-08-31 13:39:10 +0200] [8801] [INFO] Listening at: http://127.0.0.1:8000 (8801)
[2022-08-31 13:39:10 +0200] [8801] [INFO] Using worker: sync
[2022-08-31 13:39:10 +0200] [8802] [INFO] Booting worker with pid: 8802
[2022-08-31 13:39:18 +0200] [8801] [WARNING] Worker with pid 8802 was terminated due to signal 11
[2022-08-31 13:39:18 +0200] [8810] [INFO] Booting worker with pid: 8810
[2022-08-31 13:39:23 +0200] [8801] [WARNING] Worker with pid 8810 was terminated due to signal 11
[2022-08-31 13:39:23 +0200] [8814] [INFO] Booting worker with pid: 8814
Same happens with python manage.py runserver command.
I'm using a virtual environment with python 3.9
Signal 11 (SIGSEGV, also known as segmentation violation) means that the program accessed a memory location that was not assigned to it.
That's usually a bug in a program. So if you're writing your own program, that's the most likely cause.
It can also commonly occur with some hardware malfunctions.
I have a script that hits a simple API on all my servers every hour to ensure they are functioning properly. My newest server isn't using my normal stack, so I suspect I've configured it improperly. It is currently returning occasional 404 errors to the logging script.
Server Config
Ubuntu, Nginx, PostgreSQL, Supervisor;
Running a Docker container with Django/Wagtail and Gunicorn.
Looks fine when I visit in webbrowser, but my script logged four 404s in the last 12 hours.
My supervisor log shows the 404s but doesn't provide any additional useful information:
[2018-07-16 20:22:35 +0000] [9] [INFO] Booting worker with pid: 9
[2018-07-16 20:22:35 +0000] [10] [INFO] Booting worker with pid: 10
[2018-07-16 20:22:35 +0000] [11] [INFO] Booting worker with pid: 11
Not Found: /_server_health/
Not Found: /_server_health/
Not Found: /_server_health/
There is no relevant information captured in the Nginx log.
Can anyone recommend any steps I can take to gather further information? Or does this fit the pattern of any known problematic server configs?
Edit: It looks like Wagtail is sometimes causing the 404: "Raised by: wagtail.core.views.serve"
Maybe a problem in my urls.py? Should this be configured differently for Wagtail?
url(r'^_server_health$', status_api),
In my Django settings and on my machine I have utc+3 configured time so the expectations were to get all logs in utc+3, but turned out, that actually they are pretty messy:
[2017-08-08 10:29:22 +0000] [1] [INFO] Starting gunicorn 19.7.1
[2017-08-08 10:29:22 +0000] [1] [DEBUG] Arbiter booted
[2017-08-08 10:29:22 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000
[2017-08-08 10:29:22 +0000] [1] [INFO] Using worker: sync
[2017-08-08 10:29:22 +0000] [7] [INFO] Booting worker with pid: 7
[2017-08-08 10:29:23 +0000] [1] [DEBUG] 1 worker
[2017-08-08 13:29:26 +0300] [7] [INFO] [dashboard.views:9] Displaying menu
Settings:
TIME_ZONE = 'Europe/Moscow'
USE_TZ = True
Maybe you can provide some hints/information how to configure or debug it?
For a moment I thought that this is a gunicorn's problem, but it uses Django settings soo I have no idea what's wrong :/
Gunicorn logging time do not relay on Django timezone, but in the local machine one, so to get the right timezone you should configure your local machine and how to do it depends in what OS is running on it.
For Debian/Ubuntu:
sudo dpkg-reconfigure tzdata
Follow the directions in the terminal.
The timezone info is saved in /etc/timezone - which can be edited or used below
If you are using CentOS you can check it in this article.
For other options, check it in Google.
Hope that it helps.
So, the timestamps were correct, but different because of my company proxy settings. Also it turned out that is best way to handle different time zones is just use utc everywhere except presentation to user.
I am trying to implement nginx + django + gunicorn for my project deployment. I am taking the help of following article:
http://tutos.readthedocs.io/en/latest/source/ndg.html . I followed the steps as described. Now, I am trying to start gunicorn. What am I getting at the screen is:
$ gunicorn ourcase.wsgi:application
[2016-05-19 19:24:25 +0000] [9290] [INFO] Starting gunicorn 19.5.0
[2016-05-19 19:24:25 +0000] [9290] [INFO] Listening at: http://127.0.0.1:8000 (9290)
[2016-05-19 19:24:25 +0000] [9290] [INFO] Using worker: sync
[2016-05-19 19:24:25 +0000] [9293] [INFO] Booting worker with pid: 9293
Since, I am new to nginx & gunicorn, I am not sure whether the above is an error or not. I am getting nothing in error log
cat /var/log/nginx/error.log
It prints nothing on the screen. Please help me to solve this.
That output means that the process is running. Which is what you want. You should try accessing the URL from the browser directly after running the command without pressing ctrl+c.
As a side note you can write a bash script to do this which will make it easier to add arguments to the gunicorn commands.
I have a gist that does just that. https://gist.github.com/marcusshepp/129c822e2065e20122d8
Let me know what other questions you might have and I'll add a comment.