Why getting a 403 error on plesk trying to run Django - django

Been following a tutorial on udemy for python, and atm im suppose to get a django app deployed.
Since I already had a vps, I didnt go with the solution on the tutorial using google cloud, so tried to configure the app on my vps, which is also running plesk.
Followed the tutorial at https://www.plesk.com/blog/tag/django-plesk/ to the letter the best I could, but keep getting the 403 error.
httpdocs
-djangoProject
---djangoProject
------asgi.py
------__init__.py
------settings.py
------urls.py
------wsgi.py
---manage.py
-passenger_wsgi.py
-python-app-venv
-tmp
passenger_wsgi.py:
import sys, os
ApplicationDirectory = 'djangoProject'
ApplicationName = 'djangoProject'
VirtualEnvDirectory = 'python-app-venv'
VirtualEnv = os.path.join(os.getcwd(), VirtualEnvDirectory, 'bin', 'python')
if sys.executable != VirtualEnv: os.execl(VirtualEnv, VirtualEnv, *sys.argv)
sys.path.insert(0, os.path.join(os.getcwd(), ApplicationDirectory))
sys.path.insert(0, os.path.join(os.getcwd(), ApplicationDirectory, ApplicationName))
sys.path.insert(0, os.path.join(os.getcwd(), VirtualEnvDirectory, 'bin'))
os.chdir(os.path.join(os.getcwd(), ApplicationDirectory))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', ApplicationName + '.settings')
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
passenger is enabled in
"Tools & Settngs > Apache Web Server"
in "Websites & Domains > Domain > Hosting & DNS > Apache & nginx settings" I've got:
"Additional directives for HTTP" and "Additional directives for HTTPS" both with:
PassengerEnabled On
PassengerAppType wsgi
PassengerStartupFile passenger_wsgi.py
and nginx proxy mode marked
"Reverse Proxy Server (nginx)" is also running
No idea what else I can give to aid in getting a solution, so if you're willing to assist and need more info please let me know.
Very thankfull in advance
EDIT:
on a previous attempt, deploying a real app on a subdomain, was getting:
[Thu Apr 01 22:52:37.928495 2021] [autoindex:error] [pid 23614:tid
140423896925952] [client xx:xx:xx:xx:0] AH01276: Cannot serve
directory /var/www/vhosts/baya.pt/leve/leve/: No matching
DirectoryIndex
(index.html,index.cgi,index.pl,index.php,index.xhtml,index.htm,index.shtml)
found, and server-generated directory index forbidden by Options
directive
This time I'm getting no errors logged
EDIT2:
#Chris:
Not sure what you mean, find no errors on the log folders (ssh), but on Plesk I get this several times:
2021-04-01 23:40:48 Error 94.61.142.214 403 GET /
HTTP/1.0 https://baya.pt/ Mozilla/5.0 (X11; Linux x86_64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114
Safari/537.36 2.52 K Apache SSL/TLS access 2021-04-01
23:40:48 Error 94.61.142.214 AH01276: Cannot serve directory
/var/www/vhosts/baya.pt/httpdocs/djangoProject/: No matching
DirectoryIndex
(index.html,index.cgi,index.pl,index.php,index.xhtml,index.htm,index.shtml)
found, and server-generated directory index forbidden by Options
directive, referer: https://baya.pt/ Apache error
EDIT 3:
removing apache directives and adding to nginx directives:
passenger_enabled on;
passenger_app_type wsgi;
passenger_startup_file passenger_wsgi.py;
Now gives me a Passenger error page, log as follows:
[ N 2021-04-01 23:50:59.1819 908/T9 age/Cor/CoreMain.cpp:671 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2021-04-01 23:50:59.1819 908/T1 age/Cor/CoreMain.cpp:1246 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2021-04-01 23:50:59.1820 908/Tb Ser/Server.h:902 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-04-01 23:50:59.1820 908/Tb Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-04-01 23:50:59.1820 908/T9 Ser/Server.h:902 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2021-04-01 23:50:59.1820 908/T9 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2021-04-01 23:50:59.2765 30199/T1 age/Wat/WatchdogMain.cpp:1373 ]: Starting Passenger watchdog...
[ N 2021-04-01 23:50:59.2871 908/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished
[ N 2021-04-01 23:50:59.3329 30209/T1 age/Cor/CoreMain.cpp:1340 ]: Starting Passenger core...
[ N 2021-04-01 23:50:59.3330 30209/T1 age/Cor/CoreMain.cpp:256 ]: Passenger core running in multi-application mode.
[ N 2021-04-01 23:50:59.3472 30209/T1 age/Cor/CoreMain.cpp:1015 ]: Passenger core online, PID 30209
[ N 2021-04-01 23:51:01.4339 30209/T7 age/Cor/SecurityUpdateChecker.h:519 ]: Security update check: no update found (next check in 24 hours)
App 31762 output: Error: Directory '/var/www/vhosts/baya.pt' is inaccessible because of a filesystem permission error.
[ E 2021-04-01 23:51:02.9127 30209/Tc age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/vhosts/baya.pt/httpdocs: Directory '/var/www/vhosts/baya.pt' is inaccessible because of a filesystem permission error.

Think I've got it:
Added directives to nginx, removed proxy mode, and chown passenger_wsgi.py to nginx
at least Im getting a django page now =)

I spent a few days implementing this process and finally found what needed to be modified. So, I'm sharing my findings and hope it will help you as well.
After completing the process step by step, I was getting the error:
No matching DirectoryIndex (index.html,index.cgi,index.pl,index.php,index.xhtml,index.htm,index.shtml) found, and server-generated directory index forbidden by Options directive
So, I unchecked the proxy mode to switch to nginx instead of apache. This time the Phusion Passenger was loading the page but it was saying that something was wrong. It didn't tell me what was wrong. So I did the following:
cd /etc/nginx/conf.d/
touch directives.conf
vi /etc/nginx/conf.d/directives.conf
a text editor opened and I typed:
passenger_app_env development;
This time when I tried to open my website, passenger showed me the error details from which I found what was wrong.
In step 6, instead of using the command $ vi ~/httpdocs/passenger_wsgi.py, just create the file on your computer, copy the content in it, save it on your local machine and then manually upload it on the server. Don't use the command line!!!
That's it. now my Django website is up and running.

Related

Using Ray from Flask - init() fails (with core dump)

I'm trying to use Ray from a Flask web application.
The whole thing runs in Docker container.
Ray Version is 0.8.6, Flask 1.1.2
When I start the web application, Ray tries to init twice, at it seems, and then the processes crashes. I added the memory limitations later on because there where some warning regarding not enough shared memory size (docker compose setting is "shm_size: '4gb'").
If I start Ray in the same container without using Flask it runs well.
import os
import flask
import ray
from flask import Flask
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
app.config.from_mapping(
SECRET_KEY='dev',
DEBUG = True
)
# ensure the instance folder exists
try:
os.makedirs(app.instance_path)
except OSError:
pass
if ray.is_initialized() == False:
ray.init(ignore_reinit_error=True,
include_webui=False,
object_store_memory=1*1024*1014*1024,
redis_max_memory=2*1024*1014*1024)
ray.worker.global_worker.run_function_on_all_workers(setup_ray_logger)
#app.route('/api/GetAccountRatings', methods=['GET'])
def GetAccountRatings():
return ...
return app
When I start the flask web app with:
export FLASK_APP="mifad.api:create_app()"
export FLASK_ENV=development
flask run --host=0.0.0.0 --port=8084
I get the following error messages:
* Serving Flask app "mifad.api:create_app()" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://0.0.0.0:8084/ (Press CTRL+C to quit)
* Restarting with stat
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
* Debugger is active!
* Debugger PIN: 331-620-174
Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
2020-07-06 07:38:10,382 INFO resource_spec.py:212 -- Starting Ray with 59.18 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,610 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:10,675 INFO resource_spec.py:212 -- Starting Ray with 59.13 GiB memory available for workers and up to 0.99 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 07:38:10,781 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,043 WARNING services.py:923 -- Redis failed to start, retrying now.
2020-07-06 07:38:11,479 ERROR import_thread.py:93 -- ImportThread: Error 111 connecting to 172.29.0.2:44946. Connection refused.
2020-07-06 07:38:11,481 ERROR worker.py:949 -- print_logs: Connection closed by server.
2020-07-06 07:38:11,488 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,899 ERROR import_thread.py:93 -- ImportThread: Error while reading from socket: (104, 'Connection reset by peer')
2020-07-06 07:38:11,901 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
2020-07-06 07:38:11,908 ERROR worker.py:949 -- print_logs: Connection closed by server.
F0706 07:38:17.390182 4555 4659 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7ff84ae8061d google::LogMessage::Fail()
# 0x7ff84ae81a8c google::LogMessage::SendToLog()
# 0x7ff84ae802f9 google::LogMessage::Flush()
# 0x7ff84ae80511 google::LogMessage::~LogMessage()
# 0x7ff84ae5dde9 ray::RayLog::~RayLog()
# 0x7ff84ac39cea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7ff84ac39f37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7ff84ac6ffb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7ff84ac71da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7ff84ac4251d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7ff84ab96870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7ff84b0b80df boost::asio::detail::scheduler::do_run_one()
# 0x7ff84b0b8cf1 boost::asio::detail::scheduler::run()
# 0x7ff84b0b9c42 boost::asio::io_context::run()
# 0x7ff84ab7db10 ray::CoreWorker::RunIOService()
# 0x7ff84a7763e7 execute_native_thread_routine_compat
# 0x7ff84deed6db start_thread
# 0x7ff84dc1688f clone
F0706 07:38:17.804720 4553 4703 service_based_gcs_client.cc:104] Check failed: num_attempts < RayConfig::instance().gcs_service_connect_retries() No entry found for GcsServerAddress
*** Check failure stack trace: ***
# 0x7fedd65e261d google::LogMessage::Fail()
# 0x7fedd65e3a8c google::LogMessage::SendToLog()
# 0x7fedd65e22f9 google::LogMessage::Flush()
# 0x7fedd65e2511 google::LogMessage::~LogMessage()
# 0x7fedd65bfde9 ray::RayLog::~RayLog()
# 0x7fedd639bcea ray::gcs::ServiceBasedGcsClient::GetGcsServerAddressFromRedis()
# 0x7fedd639bf37 _ZNSt17_Function_handlerIFSt4pairISsiEvEZN3ray3gcs21ServiceBasedGcsClient7ConnectERN5boost4asio10io_contextEEUlvE_E9_M_invokeERKSt9_Any_data
# 0x7fedd63d1fb7 ray::rpc::GcsRpcClient::Reconnect()
# 0x7fedd63d3da8 _ZNSt17_Function_handlerIFvRKN3ray6StatusERKNS0_3rpc19AddProfileDataReplyEEZNS4_12GcsRpcClient14AddProfileDataERKNS4_21AddProfileDataRequestERKSt8functionIS8_EEUlS3_S7_E_E9_M_invokeERKSt9_Any_dataS3_S7_
# 0x7fedd63a451d ray::rpc::ClientCallImpl<>::OnReplyReceived()
# 0x7fedd62f8870 _ZN5boost4asio6detail18completion_handlerIZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
# 0x7fedd681a0df boost::asio::detail::scheduler::do_run_one()
# 0x7fedd681acf1 boost::asio::detail::scheduler::run()
# 0x7fedd681bc42 boost::asio::io_context::run()
# 0x7fedd62dfb10 ray::CoreWorker::RunIOService()
# 0x7fedd5ed83e7 execute_native_thread_routine_compat
# 0x7fedd968f6db start_thread
# 0x7fedd93b888f clone
Aborted (core dumped)
What am I doing wrong?
Best regards,
Bernd

".sock" does not exist error; connect() error

I'm trying to set up my Mezzanine/Django project on a virtual Ubuntu 14.04 box on Linode, but get a "502 Bad Gateway error" when trying to navigate to my site in my browser.
I'm following the instructions here: https://linode.com/docs/web-servers/nginx/deploy-django-applications-using-uwsgi-and-nginx-on-ubuntu-14-04.
I ran git clone in /home/django/, so everything's in a folder named FOLDER.
/home/django/ has these directories: Env/ and FOLDER/.
This should give some idea of what FOLDER/ tree looks like:
FOLDER
- product_blog
-- product_blog
--- settings.py
--- wsgi.py
-- manage.py
In my settings.py, I have:
ALLOWED_HOSTS = ["PUBLIC IP OF MY LINODE UBUNTU INSTANCE HERE"]
wsgi.py has this:
"""
WSGI config for product_blog project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/1.10/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
from mezzanine.utils.conf import real_project_name
os.environ.setdefault("DJANGO_SETTINGS_MODULE",
"%s.settings" % real_project_name("product_blog"))
application = get_wsgi_application()
/etc/uwsgi/sites/product_blog.ini has this:
[uwsgi]
project = product_blog
base = /home/django
chdir = %(base)/%(project)
home = %(base)/Env/%(project)
module = %(project).wsgi:application
master = true
processes = 2
socket = %(base)/%(project)/%(project).sock
chmod-socket = 664
vacuum = true
/etc/init/uwsgi.conf has this:
description "uWSGI"
start on runlevel [2345]
stop on runlevel [06]
respawn
env UWSGI=/usr/local/bin/uwsgi
env LOGTO=/var/log/uwsgi.log
exec $UWSGI --master --emperor /etc/uwsgi/sites --die-on-term --uid django --gid www-data --logto $LOGTO
/etc/nginx/sites-available/product_blog has this:
server {
listen 80;
server_name mydomain.com;
location = /favicon.ico { access_log off; log_not_found off; }
location /static/ {
root /home/django/FOLDER;
}
location / {
include uwsgi_params;
uwsgi_pass unix:/home/django/FOLDER/product_blog/product_blog.sock;
}
}
When I go to the IP address for the Linode box, I see a 502 Bad Gateway error.
/var/log/uwsgi.log has this...
*** has_emperor mode detected (fd: 6) ***
[uWSGI] getting INI configuration from product_blog.ini
*** Starting uWSGI 2.0.17 (64bit) on [Sun Apr 22 17:17:56 2018] ***
compiled with version: 4.8.4 on 22 April 2018 19:53:45
os: Linux-4.15.12-x86_64-linode105 #1 SMP Thu Mar 22 02:13:40 UTC 2018
nodename: ubuntu-linode
machine: x86_64
clock source: unix
detected number of CPU cores: 1
current working directory: /etc/uwsgi/sites
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
chdir() to /home/django/product_blog
chdir(): No such file or directory [core/uwsgi.c line 2629]
chdir(): No such file or directory [core/uwsgi.c line 1644]
Sun Apr 22 17:17:56 2018 - [emperor] curse the uwsgi instance product_blog.ini (pid: 4238)
Sun Apr 22 17:17:59 2018 - [emperor] removed uwsgi instance product_blog.ini
And /var/log/nginx/error.log has this (identifying info removed):
2018/04/22 17:18:14 [crit] 3953#0: *9 connect() to unix:/home/django/FOLDER/product_blog/product_blog.sock failed (2: No such file or directory) while connecting to upstream, client: 73.49.35.42, server: mydomain.com, request: "GET / HTTP/1.1", upstream: "uwsgi://unix:/home/django/FOLDER/product_blog/product_blog.sock:", host: "ADDRESS TO LINODE BOX"
It seems that product_blog.sock is supposed to be somewhere in my directory, but it is not there. How do I fix the 502 error so that I can navigate my browser to the Linode box's address and see a working website?
The socket file is supposed to be created by uWSGI. But the uWSGI log is telling you that it can't start up because it can't cd to the directory you have specified in product_blog.ini, /home/django/product_blog.
I can't tell if FOLDER is a placeholder or not, but in any case you don't seem to have included it in that path. I suppose it should be:
chdir = %(base)/FOLDER/%(project)
...
socket = %(base)/FOLDER/%(project)/%(project).sock

How to solve 403 error in scrapy

I'm new to scrapy and I made the scrapy project to scrap data.
I'm trying to scrapy the data from the website but I'm getting following error logs
2016-08-29 14:07:57 [scrapy] INFO: Enabled item pipelines:
[]
2016-08-29 13:55:03 [scrapy] INFO: Spider opened
2016-08-29 13:55:03 [scrapy] INFO: Crawled 0 pages (at 0 pages/min),scraped 0 items (at 0 items/min)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/robots.txt> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/Mumbai/small-business> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Ignoring response <403 http://www.justdial.com/Mumbai/small-business>: HTTP status code is not handled or not allowed
2016-08-29 13:55:04 [scrapy] INFO: Closing spider (finished)
I'm trying following command then on website console then I got the response but when I'm using same path inside python script then I got the error which I have described above.
Commands on web console:
$x('//div[#class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/h4/span/a/text()')
$x('//div[#class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/p[#class="contact-info"]/span/a/text()')
Please help me.
Thanks
Like Avihoo Mamka mentioned in the comment you need to provide some extra request headers to not get rejected by this website.
In this case it seems to just be the User-Agent header. By default scrapy identifies itself with user agent "Scrapy/{version}(+http://scrapy.org)". Some websites might reject this for one reason or another.
To avoid this just set headers parameter of your Request with a common user agent string:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
yield Request(url, headers=headers)
You can find a huge list of user-agents here, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results
You can implement it to work with your spiders start_urls too:
class MySpider(scrapy.Spider):
name = "myspider"
start_urls = (
'http://scrapy.org',
)
def start_requests(self):
headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
for url in self.start_urls:
yield Request(url, headers=headers)
Add the following script on your settings.py file. This works well if you are combining selenium with scrapy
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
I just needed to get my shell to work and run some quick tests so Granitosaurus's solution was a bit overkill for me.
I literally just went to the settings.py where you'll find mostly everything is commented out. In like line 16-17 or something you'll find something like this...
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'exercise01part01 (+http://www.yourdomain.com)'
You just need uncomment it and replace it with any user agent like 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'
You can find a list of them here https://www.useragentstring.com/pages/useragentstring.php[][1]
So it'll look something like this...
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'
You'll definitely want to rotate user agents if you want to make a make a large-scale crawler. But I just needed to get my scrapy shell to work and make some quick tests without getting that pesky 403 error so this one-liner sufficed. It was nice because I did not need to make a fancy function or anything.
Happy scrapy-ing
Note: PLEASE make sure you are in the same directory as settings.py when you run scrapy shell in order to utilize the changes you just made. It does not work if you are in a parent directory.
How could the whole process of error solution look like:
You can find a huge list of user-agents at https://www.useragentstring.com/pages/useragentstring.php, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results (find more at How to solve 403 error in scrapy).
An example of steps working for me for Windows 10 in scrapy shell follows:
https://www.useragentstring.com/pages/useragentstring.php -> choose 1 link from BROWSERS (but you also can try a link from links from CRAWLERS, ...) ->
e.g. Chrome = https://www.useragentstring.com/pages/Chrome/ -> choose 1 of lines, e.g.:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 -> choose 1 part (text that belongs together) from that line, e.g.: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) ->
Command Prompt -> go into the project folder -> scrapy shell
from scrapy import Request
req = Request('https://www.whiskyshop.com/scotch-whisky?item_availability=In+Stock', headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)'})
fetch(req)
Now, the result should be 200.
You see that it works even though I am on Windows 10 and there is Macintosh in Request().
You can use previous steps also to add a chosen header to file "settings.py".
Notes 1: Also comments in following stackoverflow web pages are more or less related (and I use them for this example):
https://stackoverflow.com/questions/52196040/scrapy-shell-and-scrapyrt-got-403-but-scrapy-crawl-works,
https://stackoverflow.com/questions/16627227/problem-http-error-403-in-python-3-web-scraping,
https://stackoverflow.com/questions/37010524/set-headers-for-scrapy-shell-request
Notes 2: I also recommend to read e.g.:
https://scrapeops.io/web-scraping-playbook/403-forbidden-error-web-scraping/
https://scrapeops.io/python-scrapy-playbook/scrapy-managing-user-agents/
https://www.simplified.guide/scrapy/change-user-agent

Django + uWSGI + nginx requests hang

I'm running a Django web application using Nginx and uWSGI. I'm having problems with the requests hanging for no apparent reason.
I have added a bunch of logging in the application, and this snippet is where it seems to hang. There are two log lines at the start of the try block, and the first one gets printed, but not he second one, so it would seem that it hangs in the middle of the code. This code is from a middleware class that I added in the Django configuration.
def process_request(self, request):
if 'auth' not in request.session:
try:
log.info("Auth not found") # this line is logged
log.info("another log line") # this line is never logged
if request.is_ajax():
return HttpResponse(status=401)
...
I managed to get a backtrace from the uWSGI thread and this is where it's stuck:
*** backtrace of 76 ***
/usr/bin/uwsgi(uwsgi_backtrace+0x2e) [0x45121e]
/usr/bin/uwsgi(what_i_am_doing+0x30) [0x451350]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7f8a4b2b8c30]
/lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x33) [0x7f8a4b37d653]
/home/vdr/vdr-ui/env/local/lib/python2.7/site-packages/gevent/core.so(+0x27625) [0x7f8a44092625]
/home/vdr/vdr-ui/env/local/lib/python2.7/site-packages/gevent/core.so(ev_run+0x29b) [0x7f8a4409d11b]
/home/vdr/vdr-ui/env/local/lib/python2.7/site-packages/gevent/core.so(+0x32bc0) [0x7f8a4409dbc0]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4bd4) [0x7f8a4a0c30d4]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d) [0x7f8a4a0c517d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(+0x162310) [0x7f8a4a0c5310]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8a4a08ce23]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(+0x7d30d) [0x7f8a49fe030d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f8a4a08ce23]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47) [0x7f8a4a04b837]
/home/vdr/vdr-ui/env/local/lib/python2.7/site-packages/greenlet.so(+0x375c) [0x7f8a49b1c75c]
/home/vdr/vdr-ui/env/local/lib/python2.7/site-packages/greenlet.so(+0x30a6) [0x7f8a49b1c0a6]
[0x7f8a42f26f38]
*** end of backtrace ***
SIGUSR2: --- uWSGI worker 3 (pid: 76) is managing request /login?next=/&token=45092ca6-c1a0-4c23-9d44-4d171fc561b8 since Wed Dec 2 09:52:44 2015 ---
The Nginx error log prints out [error] 619#0: *55 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.17.0.1, server: vdr
There are no errors in the printouts from uWSGI, so I'm a bit at a loss. Has anyone seen anything similar? All this is running within a Docker container if that makes any difference.
Nginx conf:
upstream uwsgi {
server unix:///tmp/vdr.sock;
}
server {
listen 80;
charset utf-8;
client_max_body_size 500M;
server_name localhost 172.17.0.2;
location /static {
alias /home/vdr/vdr-ui/static;
}
location / {
include uwsgi_params;
uwsgi_pass uwsgi;
uwsgi_read_timeout 200s;
}
}
uWSGI conf:
[uwsgi]
chdir = %d
module = alft_ui.wsgi:application
uid=1000
master=true
pidfile=/tmp/vdr.pid
vacuum=true
max-requests=5000
processes=4
env=DJANGO_SETTINGS_MODULE=alft_ui.settings.prod-live
home=/home/vdr/vdr-ui/env
socket=/tmp/vdr.sock
chmod-socket=666
So I finally found the cause for this. It turns out that my setup script added some logstash settings to the Django configuration. These settings pointed to the IP 10.8.0.1 which wasn't reachable from this environment. This would explain why the app got stuck on a logging line. Removing these settings made everything work again.
Always good to know that it was your own fault all along :)

Gearman client failed in webpage,but succeed in command line interface?

The client.php example using command "php client.php" , in http://gearman.org/getting-started/ can successfully communicate to worker.php, but using in webbrowser failed to communicated to worker.php, Does anyone know why and how to configure the gearmand or work around?
OS:CentOS 6.7
Gearmand version:1.1.8.
Gearmand started with "gearmand -l stderr --verbose DEBUG"
when Clients communicate using "gearman -f work < /somedir/somefile" command, the information return as predicted, terminal displays informations as follow,
DEBUG 2015-10-30 11:56:01.371309 [ 1 ] Received GEARMAN_GRAB_JOB_ALL ::58ca:3fa1:77f:0%4234047483:2705334353 -> libgearman-server/thread.cc:310
DEBUG 2015-10-30 11:56:01.371317 [ 1 ] ::58ca:3fa1:77f:0%4234047483:41704 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2015-10-30 11:56:01.371334 [ proc ] ::58ca:3fa1:77f:0%4234047483:41704 packet command GEARMAN_CAN_DO -> libgearman-server/server.cc:111
DEBUG 2015-10-30 11:56:01.371344 [ proc ] Registering function: work -> libgearman-server/server.cc:522
DEBUG 2015-10-30 11:56:01.371352 [ proc ] ::58ca:3fa1:77f:0%4234047483:41704 packet command GEARMAN_GRAB_JOB_ALL -> libgearman-server/server.cc:111
DEBUG 2015-10-30 11:56:01.371371 [ 1 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:610
but when webbrowser navigates to "http://localhost/client.php",no information showed in web browser, terminal displays nothing too.
information in error.log of nginx as follow:
2015/10/30 04:59:10 [error] 2756#0: *2 FastCGI sent in stderr: "PHP message: PHP Warning: GearmanClient::doNormal(): send_packet(GEARMAN_COULD_NOT_CONNECT) Failed to send server-options packet -> libgearman/connection.cc:485 in /usr/share/nginx/html/client.php on line 4" while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET /client.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "localhost"
[root#localhost html]# cat client.php
<?php
$client= new GearmanClient();
$client->addServer("127.0.0.1",4730);
print $client->doNormal("reverse", "Hello World!");
?>
[root#localhost html]# cat worker.php
<?php
$worker= new GearmanWorker();
$worker->addServer("127.0.0.1",4730);
$worker->addFunction("reverse", "my_reverse_function");
while ($worker->work());
function my_reverse_function($job)
{
return strrev($job->workload());
}
?>
maybe the problem is that the webpage has limits or permission on socket operation?
I think configuration with --http-port option maybe now not mature and stable,So my prefered solution is that php webpages as client can submit job directly to Gearmand, to be processed by a C++ complied worker program. And the c++ worker program should serve many request without call and run and exit per request to save time.
Can this solution possible.
Please help me.
Thanks a lot!
With guidance from tom and Wali Usmani and Clint, finally the Cause was narrowed down to the permission problem in SELinux.
Details can be refered to https://groups.google.com/forum/#!topic/gearman/_dW8SRWAonw.
many thanks to tom and Wali Usmani and Clint.