Scrapy cannot handle bad headers properly [ScrapyHTTPPageGetter,client] Unhandled Error - python-2.7

Environment:
Scrapy 0.16.2
Twisted-12.2.0
python 2.7
macosx-10.6
Okey here is my problem:
I try to run
scrapy shell http://aaa.17domn.com/bt9/file.php/MERH77V.html
Error:
[ScrapyHTTPPageGetter,client] Unhandled Error
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
why = getattr(selectable, method)()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 202, in doRead
return self._dataReceived(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/tcp.py", line 208, in _dataReceived
rval = self.protocol.dataReceived(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/protocols/basic.py", line 564, in dataReceived
why = self.lineReceived(line)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 50, in lineReceived
return HTTPClient.lineReceived(self, line.rstrip())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 450, in lineReceived
self.extractHeader(self._header)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/web/http.py", line 406, in extractHeader
key, val = header.split(':',1)
exceptions.ValueError: need more than 1 value to unpack
I found the solution from https://groups.google.com/forum/#!msg/scrapy-users/xFKo8ggzPxs/VXDl3CZ4V4cJ
They describe this is caused by twisted. Then I patched function extractHeader in /twisted/web/http.py from http://twistedmatrix.com/trac/ticket/2842. Its WORKS
BUT BUT, Hold on NOt yet!!!
I run another web
scrapy shell http://www1.wkdown.info/fs3/file.php/M994ATR.html
Error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.2.0-py2.7-macosx-10.6-intel.egg/twisted/internet/defer.py", line 551, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Scrapy-0.16.2-py2.7.egg/scrapy/core/downloader/webclient.py", line 122, in _build_response
status = int(self.status)
ValueError: invalid literal for int() with base 10: 'html'
I think something happen on response headers. Scrapy cannot handle it well.
Any idea?
Thank you!

Related

How to install `distro-info===0.18ubuntu0.18.04.1`?

Trying to modernize an old Django project (2.2), and its requirements.txt (generated via pip freeze) has some lines that make pip install throw fits:
distro-info===0.18ubuntu0.18.04.1
I interpreted the errors I got for the first one (see the error output in its entirety at the bottom) as the version string not conforming to PEP-518, but it doesn't even mention the === operator. This SO thread, What are triple equal signs and ubuntu2 in Python pip freeze?, has a similar issue, but:
The errors they got is different (ValueError as opposed to my ParseError).
The solution was to upgrade pip, but I'm already using the latest one.
Now, pip install distro-info works so should I just go with that?
update: The project I'm trying to update has been conceived around 2020, and according to the PyPI history of distro-info, it had a 0.10 release in 2013 and a 1.0 in 2021. Could this anything have to do with the weird pip freeze output? (From this PyPI support issue.)
The error:
ERROR: Exception:
Traceback (most recent call last):
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3021, in _dep_map
return self.__dep_map
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2815, in __getattr__
raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/packaging/requirements.py", line 102, in __init__
req = REQUIREMENT.parseString(requirement_string)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pyparsing/core.py", line 1141, in parse_string
raise exc.with_traceback(None)
pip._vendor.pyparsing.exceptions.ParseException: Expected string_end, found '(' (at char 12), (line:1, col:13)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3101, in __init__
super(Requirement, self).__init__(requirement_string)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/packaging/requirements.py", line 104, in __init__
raise InvalidRequirement(
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "'(===0.18'": Expected string_end
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 160, in exc_logging_wrapper
status = run_func(*args)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 247, in wrapper
return func(self, options, args)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 400, in run
requirement_set = resolver.resolve(
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/resolvelib/resolvers.py", line 373, in resolve
failure_causes = self._attempt_to_pin_criterion(name)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/resolvelib/resolvers.py", line 213, in _attempt_to_pin_criterion
criteria = self._get_updated_criteria(candidate)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/resolvelib/resolvers.py", line 203, in _get_updated_criteria
for requirement in self._p.get_dependencies(candidate=candidate):
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/resolution/resolvelib/provider.py", line 237, in get_dependencies
return [r for r in candidate.iter_dependencies(with_requires) if r is not None]
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/resolution/resolvelib/provider.py", line 237, in <listcomp>
return [r for r in candidate.iter_dependencies(with_requires) if r is not None]
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 247, in iter_dependencies
requires = self.dist.iter_dependencies() if with_requires else ()
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_internal/metadata/pkg_resources.py", line 216, in iter_dependencies
return self._dist.requires(extras)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2736, in requires
dm = self._dep_map
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3023, in _dep_map
self.__dep_map = self._compute_dependencies()
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3033, in _compute_dependencies
reqs.extend(parse_requirements(req))
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3094, in parse_requirements
yield Requirement(line)
File "/home/old-django-project/.venv/lib/python3.10/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3103, in __init__
raise RequirementParseError(str(e))
pip._vendor.pkg_resources.RequirementParseError: Parse error at "'(===0.18'": Expected string_end
Looks like your library was discontinued. In PyPi, infact, I can see there are only 1.0 and 0.10. If you need that specific version, then you need to setup a manual installation, downloading the source here. Either, you can upgrade your version and try to refactor any possible problem coming after!
In case, if you need to dockerize your app, setting up a script for the manual installation of a library is simple.

Q: Sonos Python Self Test error: No handlers could be found for logger "smapi"

I am trying to run the SONOS self test for a music service on Sonos.
After getting the dependencies, and filling out the config file, I try to run the python Sonos selftest, however it runs into an error and I have no clue what the underlying issue might be to get it running:
No handlers could be found for logger "smapi"
Traceback (most recent call last):
File "suite_selftest.py", line 226, in <module>
nightly_mode(parser.config_file)
File "suite_selftest.py", line 51, in nightly_mode
development_mode(config_file)
File "suite_selftest.py", line 186, in development_mode
fixtures.append(getlastupdate.PollingIntervalTest(suite.client, suite.smapiservice))
File "/Users/thomas/Desktop/PythonSelfTest/smapi/content_workflow/getlastupdate.py", line 20, in __init__
self.poll_interval = self.smapiservice.get_polling_interval()
File "../../sonos-1.1.0.dev_r300235-py2.7.egg/sonos/smapi/smapiservice.py", line 465, in get_polling_interval
File "/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 362, in getfloat
return self._get(section, float, option)
File "/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 356, in _get
return conv(self.get(section, option))
ValueError: could not convert string to float:
Found the fix already, forgot to add the Polling Interval in the config file...

happybase hbase table.put command error?

I am trying to connect to hbase-1.2.6 from python code, is as:
import happybase
connection = happybase.Connection(host='localhost',port=16010)
table = connection.table('blogpost')
table.put('1', {'post:title': 'hello world1'})
I manually created the table-"blogpost" in hbase. I am using python-2.7 and happybase-1.1.0.
error log are as:
/usr/bin/python2.7 /home/spark/PycharmProjects/PySpark/hbase.py
Traceback (most recent call last):
File "/home/spark/PycharmProjects/PySpark/hbase.py", line 5, in <module>
table.put('1', {'post:title': 'hello world1'})
File "/usr/local/lib/python2.7/dist-packages/happybase/table.py", line 464, in put
batch.put(row, data)
File "/usr/local/lib/python2.7/dist-packages/happybase/batch.py", line 137, in __exit__
self.send()
File "/usr/local/lib/python2.7/dist-packages/happybase/batch.py", line 60, in send
self._table.connection.client.mutateRows(self._table.name, bms, {})
File "/usr/local/lib/python2.7/dist-packages/thriftpy/thrift.py", line 198, in _req
return self._recv(_api)
File "/usr/local/lib/python2.7/dist-packages/thriftpy/thrift.py", line 210, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
cybin.ProtocolError: No protocol version header
thanks.
Process finished with exit code 1

ParallelPython created in thread gives

Creating a job server gives the error:
Python int too large to convert to C long
when created in thread (win32), using parallel python version 1.6.5 & Python27.
Here is the traceback:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\ParrallelPython.py", line 91, in __init__
self.job_server = pp.Server(ppservers=self.ppservers, socket_timeout=3600)
File "C:\Python27\lib\site-packages\pp.py", line 341, in __init__
self.set_ncpus(ncpus)
File "C:\Python27\lib\site-packages\pp.py", line 505, in set_ncpus
range(ncpus - len(self.__workers))])
File "C:\Python27\lib\site-packages\pp.py", line 138, in __init__
self.start()
File "C:\Python27\lib\site-packages\pp.py", line 150, in start
self.pid = int(self.t.receive())
File "C:\Python27\lib\site-packages\pptransport.py", line 148, in receive
msg = self.r.read(e_size-r_size)
OverflowError: Python int too large to convert to C long
Similar issue to link below:
ParallelPython Forum Question

TypeError when using botocore to read from AWS SQS queue

I'm using a Tornado server with tornado-botocore to connect to Amazon SQS services.
When running stress tests we sometimes get the following exception:
Traceback (most recent call last):
File "/home/app/handlers/WebSocketsHandler.py", line 95, in listen_outgoing_queue
message = yield tornado.gen.Task(self.outgoing_queue.read)
File "/home/local/lib/python2.7/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/local/lib/python2.7/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "/home/local/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/home/local/lib/python2.7/site-packages/tornado_botocore/base.py", line 70, in prepare_response
response_dict, operation_model.output_shape)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 155, in parse
return self._do_error_parse(response, shape)
File "/home/.env/local/lib/python2.7/site-packages/botocore/parsers.py", line 314, in _do_error_parse
root = self._parse_xml_string_to_dom(xml_contents)
File "/home/local/lib/python2.7/site-packages/botocore/parsers.py", line 274, in _parse_xml_string_to_dom
parser.feed(xml_string)
TypeError: must be string or read-only buffer, not None
could it be caused by the concurrency?
has anyone encountered such behavior?
We are using tornado 4.2.1, botocore 0.65.0 and tonado-botocore 0.1.6
problem solved once i removed the #tornado.gen.engine decorator from the method