Does this mean I am mining? - ether

I am consistantly getting work like this
m 19:26:27|ethminer Got work package: #7ba30d33
m 19:26:28|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:06
m 19:26:28|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:06
m 19:26:29|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:06
m 19:26:30|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:06
m 19:26:30|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:06
m 19:26:
The fact that there are all zeroes on these lines makes me believe that I am not mining. However, that may be that the rates that my computer is working at don't show much progress, regardless of the work being done.
Does this mean I am mining? How does one tell if one is mining or not?
EDIT 1
After installing https://github.com/Atrides/eth-proxy
and connecting to it to localhost:8080, I am getting console output many times faster
m 19:59:52|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
m 19:59:52|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
m 19:59:52|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
m 19:59:52|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
m 19:59:53|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
m 19:59:53|ethminer Speed 0.00 Mh/s gpu/0 0.00 [A0+0:R0+0:F0] Time: 00:02
but I see far fewer work messages. I am having it send me emails monitoring progress.
Feels like I am mining, but I have no idea. LOL.
EDIT 2
I am seeing output on the proxy console:
2017-11-11 20:10:12,477 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:10:30,483 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:10:53,393 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:11:16,849 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:11:45,243 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:11:55,169 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:05,819 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:16,162 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:28,789 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:30,112 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:38,792 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:43,076 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:49,032 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:53,443 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:12:58,643 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:13:09,319 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:13:17,884 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:13:53,282 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:14:04,272 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:14:16,478 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:14:27,672 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:14:31,536 INFO proxy # NEW_JOB MAIN_POOL
2017-11-11 20:14:33,789 INFO proxy # NEW_JOB MAIN_POOL

I believe I found my answer. The problem is my card is too weak to mine with. The CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2147483648 is too small.
[idf#localhost build]$ ethminer --list-devices
Listing OpenCL devices.
FORMAT: [platformID] [deviceID] deviceName
[2] [0] Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 4294967296
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2147483648
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
[idf#localhost build]$

Related

Unknown timegap in django + uwsgi profile

I'm trying to understand why my website (django + uwsgi + nginx) response is take about 1.5 seconds.
I'm using jazzband/django-silk for this purpose, and in it's admin panel I see that request takes about 800ms
https://i.stack.imgur.com/SJNaZ.png
But when I inspect nginx / uwsi logs I see that execution time is much more from 1.5 to 2 seconds
uwsgi log:
=> generated 99087 bytes in 1818 msecs (HTTP/2.0 200) 4 headers in 133 bytes (1 switches on core 0)
nginx log (time in the end)
10.136.52.100 - - [27/May/2020:14:51:04 +0000] "GET / HTTP/2.0" 200 12250 "https://localhost:4443/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36" 1.819 1.820 .
So I'm losing about 500ms on every request. Can somebody help me to understand what is wrong, or this is right behavior ?
Config of uwsgi bellow:
[uwsgi]
module = newsletter.wsgi
master = true
processes = 10
socket = /var/run/newsletter/app.sock
chmod-socket = 777
vacuum = true
enable-threads = true
ignore-sigpipe = true
ignore-write-errors = true
disable-write-exception = true

Django REST Framework slow on Gunicorn

I deployed a new application made on Django REST Framework running with Gunicorn.
The application is deployed on 4 different servers and they are listening on a port (8001) which is consumed by an HAProxy load balancer.
Unfortunately I am suffering many performance problems: many requests takes seconds to be served, even 30 or 60 seconds.
Sometimes even the basic entry endpoint (like https://my.api/api/v2 which basically returns the list of the available endpoints) requires 10/20 seconds to respond.
I don't think the problem is the load balancer because I have latencies connecting directly to any backend servers with my client, so bypassing the load balancer.
And I don't think the problem is the database because the call to https://my.api/api/v2 as guest (not logged with any user) shouldn't make any database query.
This is a performance test made with hey (https://github.com/rakyll/hey) on the basic endpoint without authorization:
me#staging2:~$ hey -n 10000 -c 100 https://my.api/api/v2/
Summary:
Total: 38.9165 secs
Slowest: 18.6772 secs
Fastest: 0.0041 secs
Average: 0.3099 secs
Requests/sec: 256.9604
Total data: 20870000 bytes
Size/request: 2087 bytes
Response time histogram:
0.004 [1] |
1.871 [9723] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
3.739 [0] |
5.606 [175] |■
7.473 [0] |
9.341 [35] |
11.208 [29] |
13.075 [0] |
14.943 [0] |
16.810 [0] |
18.677 [37] |
Latency distribution:
10% in 0.0054 secs
25% in 0.0122 secs
50% in 0.0322 secs
75% in 0.2378 secs
90% in 0.3417 secs
95% in 0.3885 secs
99% in 8.5935 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0021 secs, 0.0041 secs, 18.6772 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0123 secs
req write: 0.0001 secs, 0.0000 secs, 0.0153 secs
resp wait: 0.3075 secs, 0.0039 secs, 18.6770 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0150 secs
Status code distribution:
[200] 10000 responses
This is my Gunicorn configuration:
bind = '0.0.0.0:8001'
backlog = 2048
workers = 1
worker_class = 'sync'
worker_connections = 1000
timeout = 120
keepalive = 5
spew = False
daemon = False
pidfile = None
umask = 0
user = None
group = None
tmp_upload_dir = None
errorlog = '-'
loglevel = 'debug'
accesslog = '-'
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"'
And Gunicorn is currently running with the following command:
/path/to/application/bin/python3 /path/to/application/bin/gunicorn --env DJANGO_SETTINGS_MODULE=settings.production -c /path/to/application/settings/gunicorn_conf.py --user=deployer --log-file=/path/to/application/django-application-gunicorn.log --chdir /path/to/application/django-application --daemon wsgi:application
What tests I can do to find out what is causing my performance problems?

2 RabbitMQ workers and 2 Scrapyd daemons running on 2 local Ubuntu instances, in which one of the rabbitmq worker is not working

I am currently working on building "Scrapy spiders control panel" in which I am testing this existing solution available on [Distributed Multi-user Scrapy Spiders Control Panel] https://github.com/aaldaber/Distributed-Multi-User-Scrapy-System-with-a-Web-UI.
I am trying to run this on my local Ubuntu Dev Machine but having issues with scrapd daemon.
One of the Workers, linkgenerator is working but scraper as worker1 is not working.
I can not figure out why scrapyd won't run on another local instance.
Background Information about the configuration.
The application comes bundled with Django, Scrapy, Pipeline for MongoDB (for saving the scraped items) and Scrapy scheduler for RabbitMQ (for distributing the links among workers). I have 2 local Ubuntu instances in which Django, MongoDB, Scrapyd daemon and RabbitMQ server running on Instance1.
On another Scrapyd daemon is running on Instance2.
RabbitMQ Workers:
linkgenerator
worker1
IP Configurations for Instances:
IP For local Ubuntu Instance1: 192.168.0.101
IP for local Ubuntu Instance2: 192.168.0.106
List of tools used:
MongoDB server
RabbitMQ server
Scrapy Scrapyd API
One RabbitMQ linkgenerator worker (WorkerName: linkgenerator) server with Scrapy installed and running scrapyd daemon on local Ubuntu Instance1: 192.168.0.101
Another one RabbitMQ scraper worker (WorkerName: worker1) server with Scrapy installed and running scrapyd daemon on local Ubuntu Instance2: 192.168.0.106
Instance1: 192.168.0.101
"Instance1" on which Django, RabbitMQ, scrapyd daemon servers running -- IP : 192.168.0.101
Instance2: 192.168.0.106
Scrapy installed on instance2 and running scrapyd daemon
Scrapy Control Panel UI Snapshot:
from snapshot, control panel outlook can be been seen, there are two workers, linkgenerator worked successfully but worker1 did not, the logs given in the end of the post
RabbitMQ status info
linkgenerator worker can successfully push the message to RabbitMQ queue, linkgenerator spider generates start_urls for "scraper spider* are consumed by scraper (worker1), which is not working, please see the logs for worker1 in end of the post
RabbitMQ settings
The below file contains the settings for MongoDB and RabbitMQ:
SCHEDULER = ".rabbitmq.scheduler.Scheduler"
SCHEDULER_PERSIST = True
RABBITMQ_HOST = 'ScrapyDevU79'
RABBITMQ_PORT = 5672
RABBITMQ_USERNAME = 'guest'
RABBITMQ_PASSWORD = 'guest'
MONGODB_PUBLIC_ADDRESS = 'OneScience:27017' # This will be shown on the web interface, but won't be used for connecting to DB
MONGODB_URI = 'localhost:27017' # Actual uri to connect to DB
MONGODB_USER = 'tariq'
MONGODB_PASSWORD = 'toor'
MONGODB_SHARDED = True
MONGODB_BUFFER_DATA = 100
# Set your link generator worker address here
LINK_GENERATOR = 'http://192.168.0.101:6800'
SCRAPERS = ['http://192.168.0.106:6800']
LINUX_USER_CREATION_ENABLED = False # Set this to True if you want a linux user account
linkgenerator scrapy.cfg settings:
[settings]
default = tester2_fda_trial20.settings
[deploy:linkgenerator]
url = http://192.168.0.101:6800
project = tester2_fda_trial20
scraper scrapy.cfg settings:
[settings]
default = tester2_fda_trial20.settings
[deploy:worker1]
url = http://192.168.0.101:6800
project = tester2_fda_trial20
scrapyd.conf file settings for Instance1 (192.168.0.101)
cat /etc/scrapyd/scrapyd.conf
[scrapyd]
eggs_dir = /var/lib/scrapyd/eggs
dbs_dir = /var/lib/scrapyd/dbs
items_dir = /var/lib/scrapyd/items
logs_dir = /var/log/scrapyd
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
#bind_address = 127.0.0.1
http_port = 6800
debug = on
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
scrapyd.conf file settings for Instance2 (192.168.0.106)
cat /etc/scrapyd/scrapyd.conf
[scrapyd]
eggs_dir = /var/lib/scrapyd/eggs
dbs_dir = /var/lib/scrapyd/dbs
items_dir = /var/lib/scrapyd/items
logs_dir = /var/log/scrapyd
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
#bind_address = 127.0.0.1
http_port = 6800
debug = on
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
RabbitMQ Status
sudo service rabbitmq-server status
[sudo] password for mtaziz:
Status of node rabbit#ScrapyDevU79
[{pid,53715},
{running_applications,
[{rabbitmq_shovel_management,
"Management extension for the Shovel plugin","3.6.11"},
{rabbitmq_shovel,"Data Shovel for RabbitMQ","3.6.11"},
{rabbitmq_management,"RabbitMQ Management Console","3.6.11"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.11"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.11"},
{rabbit,"RabbitMQ","3.6.11"},
{os_mon,"CPO CXC 138 46","2.2.14"},
{cowboy,"Small, fast, modular HTTP server.","1.0.4"},
{ranch,"Socket acceptor pool for TCP protocols.","1.3.0"},
{ssl,"Erlang/OTP SSL application","5.3.2"},
{public_key,"Public key infrastructure","0.21"},
{cowlib,"Support library for manipulating Web protocols.","1.0.2"},
{crypto,"CRYPTO version 2","3.2"},
{amqp_client,"RabbitMQ AMQP Client","3.6.11"},
{rabbit_common,
"Modules shared by rabbitmq-server and rabbitmq-erlang-client",
"3.6.11"},
{inets,"INETS CXC 138 49","5.9.7"},
{mnesia,"MNESIA CXC 138 12","4.11"},
{compiler,"ERTS CXC 138 10","4.9.4"},
{xmerl,"XML parser","1.3.5"},
{syntax_tools,"Syntax tools","1.6.12"},
{asn1,"The Erlang ASN1 compiler version 2.0.4","2.0.4"},
{sasl,"SASL CXC 138 11","2.3.4"},
{stdlib,"ERTS CXC 138 10","1.19.4"},
{kernel,"ERTS CXC 138 10","2.16.4"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
{memory,
[{connection_readers,0},
{connection_writers,0},
{connection_channels,0},
{connection_other,6856},
{queue_procs,145160},
{queue_slave_procs,0},
{plugins,1959248},
{other_proc,22328920},
{metrics,160112},
{mgmt_db,655320},
{mnesia,83952},
{other_ets,2355800},
{binary,96920},
{msg_index,47352},
{code,27101161},
{atom,992409},
{other_system,31074022},
{total,87007232}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]},
{vm_memory_calculation_strategy,rss},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3343646720},
{disk_free_limit,50000000},
{disk_free,56257699840},
{file_descriptors,
[{total_limit,924},{total_used,2},{sockets_limit,829},{sockets_used,0}]},
{processes,[{limit,1048576},{used,351}]},
{run_queue,0},
{uptime,34537},
{kernel,{net_ticktime,60}}]
scrapyd daemon on Instance1 ( 192.168.0.101 ) running status:
scrapyd
2017-09-11T06:16:07+0600 [-] Loading /home/mtaziz/.virtualenvs/onescience_dist_env/local/lib/python2.7/site-packages/scrapyd/txapp.py...
2017-09-11T06:16:07+0600 [-] Scrapyd web console available at http://0.0.0.0:6800/
2017-09-11T06:16:07+0600 [-] Loaded.
2017-09-11T06:16:07+0600 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.5.0 (/home/mtaziz/.virtualenvs/onescience_dist_env/bin/python 2.7.6) starting up.
2017-09-11T06:16:07+0600 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
2017-09-11T06:16:07+0600 [-] Site starting on 6800
2017-09-11T06:16:07+0600 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x7f5e265c77a0>
2017-09-11T06:16:07+0600 [Launcher] Scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner'
2017-09-11T06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:16:07 +0000] "GET /listprojects.json HTTP/1.1" 200 98 "-" "python-requests/2.18.4"
2017-09-11T06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:16:07 +0000] "GET /listversions.json?project=tester2_fda_trial20 HTTP/1.1" 200 80 "-" "python-requests/2.18.4"
2017-09-11T06:16:07+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:16:07 +0000] "GET /listjobs.json?project=tester2_fda_trial20 HTTP/1.1" 200 92 "-" "python-requests/2.18.4"
scrapyd daemon on instance2 (192.168.0.106) running status:
scrapyd
2017-09-11T06:09:28+0600 [-] Loading /home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapyd/txapp.py...
2017-09-11T06:09:28+0600 [-] Scrapyd web console available at http://0.0.0.0:6800/
2017-09-11T06:09:28+0600 [-] Loaded.
2017-09-11T06:09:28+0600 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.5.0 (/home/mtaziz/.virtualenvs/scrapydevenv/bin/python 2.7.6) starting up.
2017-09-11T06:09:28+0600 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
2017-09-11T06:09:28+0600 [-] Site starting on 6800
2017-09-11T06:09:28+0600 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x7fbe6eaeac20>
2017-09-11T06:09:28+0600 [Launcher] Scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner'
2017-09-11T06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:09:32 +0000] "GET /listprojects.json HTTP/1.1" 200 98 "-" "python-requests/2.18.4"
2017-09-11T06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:09:32 +0000] "GET /listversions.json?project=tester2_fda_trial20 HTTP/1.1" 200 80 "-" "python-requests/2.18.4"
2017-09-11T06:09:32+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:09:32 +0000] "GET /listjobs.json?project=tester2_fda_trial20 HTTP/1.1" 200 92 "-" "python-requests/2.18.4"
2017-09-11T06:09:37+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:09:37 +0000] "GET /listprojects.json HTTP/1.1" 200 98 "-" "python-requests/2.18.4"
2017-09-11T06:09:37+0600 [twisted.python.log#info] "192.168.0.101" - - [11/Sep/2017:00:09:37 +0000] "GET /listversions.json?project=tester2_fda_trial20 HTTP/1.1" 200 80 "-" "python-requests/2.18.4"
worker1 logs
After updating the code for RabbitMQ server settings followed by the suggestions made by #Tarun Lalwani
The suggestion was to use RabbitMQ Server IP - 192.168.0.101:5672 instead of
127.0.0.1:5672. After I updated as suggested by Tarun Lalwani got the new problems as below............
2017-09-11 15:49:18 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: tester2_fda_trial20)
2017-09-11 15:49:18 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tester2_fda_trial20.spiders', 'ROBOTSTXT_OBEY': True, 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['tester2_fda_trial20.spiders'], 'BOT_NAME': 'tester2_fda_trial20', 'FEED_URI': 'file:///var/lib/scrapyd/items/tester2_fda_trial20/tester2_fda_trial20/79b1123a96d611e79276000c29bad697.jl', 'SCHEDULER': 'tester2_fda_trial20.rabbitmq.scheduler.Scheduler', 'TELNETCONSOLE_ENABLED': False, 'LOG_FILE': '/var/log/scrapyd/tester2_fda_trial20/tester2_fda_trial20/79b1123a96d611e79276000c29bad697.log'}
2017-09-11 15:49:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.corestats.CoreStats']
2017-09-11 15:49:18 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-09-11 15:49:18 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-09-11 15:49:18 [scrapy.middleware] INFO: Enabled item pipelines:
['tester2_fda_trial20.pipelines.FdaTrial20Pipeline',
'tester2_fda_trial20.mongodb.scrapy_mongodb.MongoDBPipeline']
2017-09-11 15:49:18 [scrapy.core.engine] INFO: Spider opened
2017-09-11 15:49:18 [pika.adapters.base_connection] INFO: Connecting to 192.168.0.101:5672
2017-09-11 15:49:18 [pika.adapters.blocking_connection] INFO: Created channel=1
2017-09-11 15:49:18 [scrapy.core.engine] INFO: Closing spider (shutdown)
2017-09-11 15:49:18 [pika.adapters.blocking_connection] INFO: Channel.close(0, Normal Shutdown)
2017-09-11 15:49:18 [pika.channel] INFO: Channel.close(0, Normal Shutdown)
2017-09-11 15:49:18 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ?.close_spider of <scrapy.extensions.feedexport.FeedExporter object at 0x7f94878b8c50>>
Traceback (most recent call last):
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/extensions/feedexport.py", line 201, in close_spider
slot = self.slot
AttributeError: 'FeedExporter' object has no attribute 'slot'
2017-09-11 15:49:18 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ?.spider_closed of <Tester2Fda_Trial20Spider 'tester2_fda_trial20' at 0x7f9484f897d0>>
Traceback (most recent call last):
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/tmp/user/1000/tester2_fda_trial20-10-d4Req9.egg/tester2_fda_trial20/spiders/tester2_fda_trial20.py", line 28, in spider_closed
AttributeError: 'Tester2Fda_Trial20Spider' object has no attribute 'statstask'
2017-09-11 15:49:18 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'shutdown',
'finish_time': datetime.datetime(2017, 9, 11, 9, 49, 18, 159896),
'log_count/ERROR': 2,
'log_count/INFO': 10}
2017-09-11 15:49:18 [scrapy.core.engine] INFO: Spider closed (shutdown)
2017-09-11 15:49:18 [twisted] CRITICAL: Unhandled error in Deferred:
2017-09-11 15:49:18 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl
six.reraise(*exc_info)
File "/home/mtaziz/.virtualenvs/scrapydevenv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
yield self.engine.open_spider(self.spider, start_requests)
OperationFailure: command SON([('saslStart', 1), ('mechanism', 'SCRAM-SHA-1'), ('payload', Binary('n,,n=tariq,r=MjY5OTQ0OTYwMjA4', 0)), ('autoAuthorize', 1)]) on namespace admin.$cmd failed: Authentication failed.
MongoDBPipeline
# coding:utf-8
import datetime
from pymongo import errors
from pymongo.mongo_client import MongoClient
from pymongo.mongo_replica_set_client import MongoReplicaSetClient
from pymongo.read_preferences import ReadPreference
from scrapy.exporters import BaseItemExporter
try:
from urllib.parse import quote
except:
from urllib import quote
def not_set(string):
""" Check if a string is None or ''
:returns: bool - True if the string is empty
"""
if string is None:
return True
elif string == '':
return True
return False
class MongoDBPipeline(BaseItemExporter):
""" MongoDB pipeline class """
# Default options
config = {
'uri': 'mongodb://localhost:27017',
'fsync': False,
'write_concern': 0,
'database': 'scrapy-mongodb',
'collection': 'items',
'replica_set': None,
'buffer': None,
'append_timestamp': False,
'sharded': False
}
# Needed for sending acknowledgement signals to RabbitMQ for all persisted items
queue = None
acked_signals = []
# Item buffer
item_buffer = dict()
def load_spider(self, spider):
self.crawler = spider.crawler
self.settings = spider.settings
self.queue = self.crawler.engine.slot.scheduler.queue
def open_spider(self, spider):
self.load_spider(spider)
# Configure the connection
self.configure()
self.spidername = spider.name
self.config['uri'] = 'mongodb://' + self.config['username'] + ':' + quote(self.config['password']) + '#' + self.config['uri'] + '/admin'
self.shardedcolls = []
if self.config['replica_set'] is not None:
self.connection = MongoReplicaSetClient(
self.config['uri'],
replicaSet=self.config['replica_set'],
w=self.config['write_concern'],
fsync=self.config['fsync'],
read_preference=ReadPreference.PRIMARY_PREFERRED)
else:
# Connecting to a stand alone MongoDB
self.connection = MongoClient(
self.config['uri'],
fsync=self.config['fsync'],
read_preference=ReadPreference.PRIMARY)
# Set up the collection
self.database = self.connection[spider.name]
# Autoshard the DB
if self.config['sharded']:
db_statuses = self.connection['config']['databases'].find({})
partitioned = []
notpartitioned = []
for status in db_statuses:
if status['partitioned']:
partitioned.append(status['_id'])
else:
notpartitioned.append(status['_id'])
if spider.name in notpartitioned or spider.name not in partitioned:
try:
self.connection.admin.command('enableSharding', spider.name)
except errors.OperationFailure:
pass
else:
collections = self.connection['config']['collections'].find({})
for coll in collections:
if (spider.name + '.') in coll['_id']:
if coll['dropped'] is not True:
if coll['_id'].index(spider.name + '.') == 0:
self.shardedcolls.append(coll['_id'][coll['_id'].index('.') + 1:])
def configure(self):
""" Configure the MongoDB connection """
# Set all regular options
options = [
('uri', 'MONGODB_URI'),
('fsync', 'MONGODB_FSYNC'),
('write_concern', 'MONGODB_REPLICA_SET_W'),
('database', 'MONGODB_DATABASE'),
('collection', 'MONGODB_COLLECTION'),
('replica_set', 'MONGODB_REPLICA_SET'),
('buffer', 'MONGODB_BUFFER_DATA'),
('append_timestamp', 'MONGODB_ADD_TIMESTAMP'),
('sharded', 'MONGODB_SHARDED'),
('username', 'MONGODB_USER'),
('password', 'MONGODB_PASSWORD')
]
for key, setting in options:
if not not_set(self.settings[setting]):
self.config[key] = self.settings[setting]
def process_item(self, item, spider):
""" Process the item and add it to MongoDB
:type item: Item object
:param item: The item to put into MongoDB
:type spider: BaseSpider object
:param spider: The spider running the queries
:returns: Item object
"""
item_name = item.__class__.__name__
# If we are working with a sharded DB, the collection will also be sharded
if self.config['sharded']:
if item_name not in self.shardedcolls:
try:
self.connection.admin.command('shardCollection', '%s.%s' % (self.spidername, item_name), key={'_id': "hashed"})
self.shardedcolls.append(item_name)
except errors.OperationFailure:
self.shardedcolls.append(item_name)
itemtoinsert = dict(self._get_serialized_fields(item))
if self.config['buffer']:
if item_name not in self.item_buffer:
self.item_buffer[item_name] = []
self.item_buffer[item_name].append([])
self.item_buffer[item_name].append(0)
self.item_buffer[item_name][1] += 1
if self.config['append_timestamp']:
itemtoinsert['scrapy-mongodb'] = {'ts': datetime.datetime.utcnow()}
self.item_buffer[item_name][0].append(itemtoinsert)
if self.item_buffer[item_name][1] == self.config['buffer']:
self.item_buffer[item_name][1] = 0
self.insert_item(self.item_buffer[item_name][0], spider, item_name)
return item
self.insert_item(itemtoinsert, spider, item_name)
return item
def close_spider(self, spider):
""" Method called when the spider is closed
:type spider: BaseSpider object
:param spider: The spider running the queries
:returns: None
"""
for key in self.item_buffer:
if self.item_buffer[key][0]:
self.insert_item(self.item_buffer[key][0], spider, key)
def insert_item(self, item, spider, item_name):
""" Process the item and add it to MongoDB
:type item: (Item object) or [(Item object)]
:param item: The item(s) to put into MongoDB
:type spider: BaseSpider object
:param spider: The spider running the queries
:returns: Item object
"""
self.collection = self.database[item_name]
if not isinstance(item, list):
if self.config['append_timestamp']:
item['scrapy-mongodb'] = {'ts': datetime.datetime.utcnow()}
ack_signal = item['ack_signal']
item.pop('ack_signal', None)
self.collection.insert(item, continue_on_error=True)
if ack_signal not in self.acked_signals:
self.queue.acknowledge(ack_signal)
self.acked_signals.append(ack_signal)
else:
signals = []
for eachitem in item:
signals.append(eachitem['ack_signal'])
eachitem.pop('ack_signal', None)
self.collection.insert(item, continue_on_error=True)
del item[:]
for ack_signal in signals:
if ack_signal not in self.acked_signals:
self.queue.acknowledge(ack_signal)
self.acked_signals.append(ack_signal)
To sum up, I believe the problem lies in scrapyd daemons running on both instances but somehow scraper or worker1 can not access it, I could not figure it out, I did not find any use cases on stackoverflow.
Any help is highly appreciated in this regard. Thank you in advance!

Tuning gunicorn (with Django): Optimize for more concurrent connections and faster connections

I use Django 1.5.3 with gunicorn 18.0 and lighttpd. I serve my static and dynamic content like that using lighttpd:
$HTTP["host"] == "www.mydomain.com" {
$HTTP["url"] !~ "^/media/|^/static/|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
proxy.balance = "hash"
proxy.server = ( "" => ("myserver" =>
( "host" => "127.0.0.1", "port" => 8013 )
))
}
$HTTP["url"] =~ "^/media|^/static|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
alias.url = (
"/media/admin/" => "/var/www/virtualenvs/mydomain/lib/python2.7/site-packages/django/contrib/admin/static/admin/",
"/media" => "/var/www/mydomain/mydomain/media",
"/static" => "/var/www/mydomain/mydomain/static"
)
}
url.rewrite-once = (
"^/apple-touch-icon(.*)$" => "/media/img/apple-touch-icon$1",
"^/favicon(.*)$" => "/media/img/favicon$1",
"^/robots\.txt$" => "/media/robots.txt"
)
}
I already tried to run gunicorn (via supervisord) in many different ways, but I cant get it better optimized than it can handle about 1100 concurrent connections. In my project I need about 10000-15000 connections
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 9 -k gevent --preload --settings=myproject.settings
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 10 -k eventlet --worker_connections=1000 --settings=myproject.settings --max-requests=10000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 20 -k gevent --settings=myproject.settings --max-requests=1000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 40 --settings=myproject.settings
On the same server there live about 10 other projects, but CPU and RAM is fine, so this shouldnt be a problem, right?
I ran a load test and these are the results:
At about 1100 connections my lighttpd errorlog says something like that, thats where the load test shows the drop of connections:
2013-10-31 14:06:51: (mod_proxy.c.853) write failed: Connection timed out 110
2013-10-31 14:06:51: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 83
2013-10-31 14:06:51: (mod_proxy.c.1316) no proxy-handler found for: /
... after about one minute
2013-10-31 14:07:02: (mod_proxy.c.1361) proxy - re-enabled: 127.0.0.1 8013
These things also appear ever now and then:
2013-10-31 14:06:55: (network_linux_sendfile.c.94) writev failed: Connection timed out 600
2013-10-31 14:06:55: (mod_proxy.c.853) write failed: Connection timed out 110
...
2013-10-31 14:06:57: (mod_proxy.c.828) establishing connection failed: Connection timed out
2013-10-31 14:06:57: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 45
So how can I tune gunicorn/lighttpd to serve more connections faster? What can I optimize? Do you know any other/better setup?
Thanks alot in advance for your help!
Update: Some more server info
root#django ~ # top
top - 15:28:38 up 100 days, 9:56, 1 user, load average: 0.11, 0.37, 0.76
Tasks: 352 total, 1 running, 351 sleeping, 0 stopped, 0 zombie
Cpu(s): 33.0%us, 1.6%sy, 0.0%ni, 64.2%id, 0.4%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 32926156k total, 17815984k used, 15110172k free, 342096k buffers
Swap: 23067560k total, 0k used, 23067560k free, 4868036k cached
root#django ~ # iostat
Linux 2.6.32-5-amd64 (django.myserver.com) 10/31/2013 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
33.00 0.00 2.36 0.40 0.00 64.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 137.76 980.27 2109.21 119567783 257268738
sdb 24.23 983.53 2112.25 119965731 257639874
sdc 24.25 985.79 2110.14 120241256 257382998
md0 0.00 0.00 0.00 400 0
md1 0.00 0.00 0.00 284 6
md2 1051.93 38.93 4203.96 4748629 512773952
root#django ~ # netstat -an |grep :80 |wc -l
7129
Kernel Settings:
echo "10152 65535" > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

WSGI using more daemon processes than it should?

So i set up a WSGI server running python/django code, and stuck the following in my httpd.conf file:
WSGIDaemonProcess mysite.com processes=2 threads=15 user=django group=django
However, when i go to the page and hit "refresh" really really quickly, it seems that i am getting way more than two processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21042 django 20 0 975m 36m 4440 S 98.9 6.2 0:15.63 httpd
1017 root 20 0 67688 2352 740 S 0.3 0.4 0:10.50 sendmail
21041 django 20 0 974m 40m 4412 S 0.3 6.7 0:16.36 httpd
21255 django 20 0 267m 8536 2036 S 0.3 1.4 0:01.02 httpd
21256 django 20 0 267m 8536 2036 S 0.3 1.4 0:00.01 httpd
I thought setting processes=2 would limit it to two processes. Is there something i'm missing?