Using Python to Manipulate HTTP Headers - python-2.7

So I'm trying to use Python to automate 508 compliance checking. There're a few hundred pages on our site, and at the moment a person is actually going through the site every week and tries to enter all the URLs by hand. The UIUC link below checks the request for the referer header and then returns the evaluation of that site. I can't get the request to actually work. I've looked all through SO and can't find anything that helps. The code that is screwy is below and below that the error message.
def fae(urltofae):
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
#[('Referer': urltofae)]
r = opener.open('http://www.fae.cita.uiuc.edu/evaluate/link/')
print r
fae("http://www.example.com/")
And the Error:
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in fae
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
And when I attempt to try to change the referer header (instead of the User-agent) I get formatting errors instead of it even getting to the request even though the format is identical to the one that it didn't complain about for the user-agent.
I'm still very much a new programmer so if I'm missing something blatant then I'm terribly sorry, but I have tried everything I can think of.
Thanks in advance, cheers.
OK so I switched my strategy, and it worked. Unfortunately, I have no idea why the below code worked, and the stuff above kept erroring me, but I have seen a couple of similar-ish questions (no specific answers) around the google so I figured I should post it.
vlz, appreciate the help, cheers.
def faeRequest2(urltofae):
r = urllib2.Request('http://fae.cita.illinois.edu/evaluate/link/', headers={'User-agent':'Mozilla/5.0', 'Referer':urltofae})
c = urllib2.urlopen(r)
print c.read()

I do not see any errory there. Is the url correct? Try using
'http://fae.cita.uiuc.edu/evaluate/link/'
instead of
'http://www.fae.cita.uiuc.edu/evaluate/link/'
The latter seems not to lead anywhere.

Related

Failed to query Elasticsearch using : TransportErrorr(400, 'parsing_exception')

I have been trying to get the Elasticsearch to work in a Django application. It has been a problem because of the mess of compatibility considerations this apparently involves. I follow the recommendations, but still get an error when I actually perform a search.
Here is what I have
Django==2.1.7
Django-Haystack==2.5.1
Elasticsearch(django)==1.7.0
Elasticsearch(Linux app)==5.0.1
There is also DjangoCMS==3.7 and aldryn-search=1.0.1, but I am not sure how relevant those are.
Here is the error I get when I submit a search query via the basic text form.
GET /videos/modelresult/_search?_source=true [status:400 request:0.001s]
Failed to query Elasticsearch using '(video)': TransportError(400, 'parsing_exception')
Traceback (most recent call last):
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/haystack/backends/elasticsearch_backend.py", line 524, in search
_source=True)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 527, in search
doc_type, '_search'), params=params, body=body)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
self._raise_error(response.status, raw_data)
File "/home/user-name/miniconda3/envs/project-web/lib/python3.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'parsing_exception')
Could someone tell me if this is an issue with compatibility or is there something else going on? How can I fix it?
The combination that appears to have worked for my setup is as follows. I believe the key was to drastically downgrade the Elasticsearch.
Elasticsearch=1.7.6 (with Java 8)
Django==2.1.7
Django-Haystack==2.8.1
elasticsearch==1.7.0
The two items below may or may not be relevant. I have not changed them.
DjangoCMS==3.7.0
aldryn-search==1.0.1

Flask-JWT generates error when Debug=False

I am playing around with Flask. I have created an API using Flask-Restful and Flask-JWT. When Debug=True in Flask, and I do not send the Authorization Header, I get the response as However, when the debug=False, the response returned is Internal Server Error with this stack trace,
[2017-01-19 19:43:10,753] ERROR in app: Exception on /api_0_1/deals [GET]
Traceback (most recent call last):
File "C:\Users\ARFATS~1\Desktop\Dealflow\venv\lib\site-packages\flask\app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\ARFATS~1\Desktop\Dealflow\venv\lib\site-packages\flask\app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\Users\ARFATS~1\Desktop\Dealflow\venv\lib\site-packages\flask_restful\__init__.py", line 477, in wrapper
resp = resource(*args, **kwargs)
File "C:\Users\ARFATS~1\Desktop\Dealflow\venv\lib\site-packages\flask_jwt\__init__.py", line 176, in decorator
_jwt_required(realm or current_app.config['JWT_DEFAULT_REALM'])
File "C:\Users\ARFATS~1\Desktop\Dealflow\venv\lib\site-packages\flask_jwt\__init__.py", line 155, in _jwt_required
headers={'WWW-Authenticate': 'JWT realm="%s"' % realm})
JWTError: Authorization Required. Request does not contain an access token
I would like Flask-JWT to respond with the response which is there when Debug=True. However, I cannot use debug on Production servers. One way is to use my own jwt_required decorator. Is there any other way?Also, I would be happy to know what I am missing, if any. Thanks
You will need to add this setting to your flask app:
app.config['PROPAGATE_EXCEPTIONS'] = True
When debug is true, PROPAGATE_EXCEPTIONS is also set to true by default.
Perhaps consider checking out flask-jwt-extended instead (https://github.com/vimalloc/flask-jwt-extended), it takes care of the PROPAGATE_EXCEPTIONS for you. It aims to replace the abandoned flask-jwt library, and add some conviences when working with JWTs (such as refresh tokens, easily adding custom data to the JWTs, fresh vs non-fresh tokens, and more). Full disclosure, I'm the author of that extension.
Cheers.

Py2neo searching error

Im still trying to make a social network with py2neo+flask+neo4j.
I've got a problem while searching my database with py2neo.I wanna find all the users that their username includes a special string.For example all the users that their username includes "dav".I wrote the code below and i dont know why i get this error...
from py2neo import Graph
graph=Graph("http://neo4j:123#localhost:7474/ ")
def search(name):
users=graph.merge("Person")
for N in users:
print N['username']
and this is my error:
Traceback (most recent call last):
File "", line 1, in
File "/home/ali/Desktop/flask/search.py", line 10, in search users=graph.cypher.execute('match (p:Person) return p'
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 659, in cypher metadata = self.resource.metadata
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 213, in metadata self.get()
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 267, in get raise_from(self.error_class(message, **content), error)
File "/usr/local/lib/python2.7/dist-packages/py2neo/util.py", line 235, in raise_from raise exception py2neo.error.GraphError: HTTP GET returned response 404
Your URL is wrong, you should change it to this:
Graph("http://neo4j:123#localhost:7474/db/data")
Also, you can't execute cypher through the merge function, instead you should do this:
users = graph.cypher.execute('match (p:Person) return p')

python-requests hanging when downloading a mass amount of files

I am trying to use python-request package to download a mass amount of files(like 10k+) from the web, each file size from several k to the largest as 100mb.
my script can run through fine for maybe 3000 files but suddenly it will hang.
I ctrl-c it and see it stuck at
r = requests.get(url, headers=headers, stream=True)
File "/Library/Python/2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 327, in send
timeout=timeout
File "/Library/Python/2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 493, in urlopen
body=body, headers=headers)
File "/Library/Python/2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 319, in _make_request
httplib_response = conn.getresponse(buffering=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
Here is my python code to do the download
basedir = os.path.dirname(filepath)
if not os.path.exists(basedir):
os.makedirs(basedir)
r = requests.get(url, headers=headers, stream=True)
with open(filepath, 'w') as f:
for chunk in r.iter_content(1024):
if chunk:
f.write(chunk)
f.flush()
I am not sure what went wrong, if anyone has a clue, please kindly share some insights.
Thanks.
This is not a duplicate of the question that #alfasin linked in their comment. Judging by the (limited) traceback you posted, the request itself is hanging (the first line shows it was executing r = requests.get(url, headers=headers, stream=True)).
What you should do is set a timeout and catch the exception that is raised when the request times out. Once you have the URL try it in a browser or with curl to ensure it responds properly, otherwise remove it from your list of URLs to request. If you find the misbehaving URL, please update your question with it.
I faced a similar situation and it seems like a bug in the requests package was causing this issue. Upgrading to requests package 2.10.0 fixed it for me.
For your reference the change log for Requests 2.10.0 shows that the embedded urllib3 was updated to version 1.15.1 Release history
And the release history for urllib3 (Release history ) shows that version 1.15.1 included fixes for:
Chunked transfer encoding when requesting with chunked=True. (Issue #790)
Fixed AppEngine handling of transfer-encoding header and bug in Timeout defaults checking. (Issue #763)

Django - UnreadablePostError from time to time?

We've got a Django-based web application that is used for receiving POST data from iOS devices (push notification tokens).
All in all, the application seems to be working fine, and we're receiving a 1000-2000 POSTs with valid data every hour. However, I'm occasionally receiving error logs from Django with the following data:
Traceback (most recent call last):
File "/opt/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/opt/local/lib/python2.7/site-packages/django/views/decorators/vary.py", line 19, in inner_func
response = func(*args, **kwargs)
File "/opt/local/lib/python2.7/site-packages/django_piston-0.2.3-py2.7.egg/piston/resource.py", line 160, in __call__
request.data = request.POST
File "/opt/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 180, in _get_post
self._load_post_and_files()
File "/opt/local/lib/python2.7/site-packages/django/http/__init__.py", line 372, in _load_post_and_files
self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
File "/opt/local/lib/python2.7/site-packages/django/http/__init__.py", line 328, in body
self._body = self.read()
File "/opt/local/lib/python2.7/site-packages/django/http/__init__.py", line 384, in read
return self._stream.read(*args, **kwargs)
File "/opt/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 98, in read
result = self.buffer + self._read_limited()
File "/opt/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 92, in _read_limited
result = self.stream.read(size)
UnreadablePostError: request data read error
And the WSGIRequest dump says POST: <could not parse>
I've been trying to find more information on this error, and a lot of what I'm seeing points to this error being caused by a user canceling a POST request before the post completes. Is this an error that I should be concerned about, or should I just set up the server to filter out these error messages? I'd say that I get maybe 8-10 automated emails per day about this.
I'm not sure whether you're still waiting for the answer but basically the comments contain most of the information. So, just to close the question up...
These errors mean that a malformed request arrived to the server. Someone might have cancelled their request or it got mangled on its way (e.g. bad internet connection), etc.
You can't solve these errors directly. However you can take a look at the page (if it's always the same one) and check if for example the loading doesn't take too long. You can also try and cause the error manually to see when exactly it happens and why. Unless you find something relevant, I don't think you need to worry about it much.