python SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),)) - python-2.7
I was scraping this aspx website https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2 .
As it required, I have to parse in __VIEWSTATE and __EVENTVALIDATION while sending a post request. Now I am trying to send a get request first to have those two values, and then parse then afterward.
However, I have tried several times to send a get request. It always turns out throwing this error message:
requests.exceptions.SSLError: HTTPSConnectionPool(host='gra206.aca.ntu.edu.tw', port=443): Max retries exceeded with url: /Temp/W2.aspx?Type=2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
I have tried:
upgrade OpenSSL
download requests[security]
However, none of them works.
I am currently using:
env:
python 2.7
bs4 4.6.0
request 2.18.4
openssl 1.0.2n
Here is my code:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
url = 'https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2'
r = s.get(url, headers={'x-test2': 'true'})
soup = BeautifulSoup(r.content, 'lxml')
viewstate = soup.find('input', {'id': '__VIEWSTATE' })['value']
validation = soup.find('input', {'id': '__EVENTVALIDATION' })['value']
print viewstate, generator, validation
I am also looking for a solution for it. Some sites have deprecated TLSv1.0 and Requests + Openssl (on Windows 7) has trouble to build handshake with such peer host. Wireshark log showed the TLSv1 Client Hello was issued by the client but the host did not answer correctly. This error propagated up as the error message Requests showed. Even with the most updated Openssl/pyOpenssl/Requests and tried on Py3.6/2.7.12, no luck. Intrestingly when I replace the url to other like "google.com", the log showed TLSv1.2 Hello was issued and responded by the host. Please check images tlsv1 and
tlsv1.2.
Clearly the client has TLSv1.2 capability but why it use v1.0 Hello in the former case?
[EDIT]
I was wrong in previous statement. Wireshark misinterpreted unfinished TLSv1.2 HELLO exchanged as TLSv1. After more digging into it, I found these hosts is expecting pure TLSv1, but not a TLSv1 fallback from TLSv1.2. Due to Openssl's lack of some fields in the Hello extension fields (maybe Supported Version) when compared with the log from Chrome. I found a workaround to it. 1. Force the use of TLSv1 negotiation. 2. Change the default cipher suite to py3.4 style to re-enable 3DES.
import ssl
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
#from urllib3.poolmanager import PoolManager
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
# py3.4 default
CIPHERS = (
'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+HIGH:'
'DH+HIGH:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+HIGH:RSA+3DES:!aNULL:'
'!eNULL:!MD5'
)
class DESAdapter(HTTPAdapter):
"""
A TransportAdapter that re-enables 3DES support in Requests.
"""
def create_ssl_context(self):
#ctx = create_urllib3_context(ciphers=FORCED_CIPHERS)
ctx = ssl.create_default_context()
# allow TLS 1.0 and TLS 1.2 and later (disable SSLv3 and SSLv2)
#ctx.options |= ssl.OP_NO_SSLv2
#ctx.options |= ssl.OP_NO_SSLv3
#ctx.options |= ssl.OP_NO_TLSv1
ctx.options |= ssl.OP_NO_TLSv1_2
ctx.options |= ssl.OP_NO_TLSv1_1
#ctx.options |= ssl.OP_NO_TLSv1_3
ctx.set_ciphers( CIPHERS )
#ctx.set_alpn_protocols(['http/1.1', 'spdy/2'])
return ctx
def init_poolmanager(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).init_poolmanager(*args, **kwargs)
def proxy_manager_for(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).proxy_manager_for(*args, **kwargs)
tmoval=10
proxies={}
hdr = {'Accept-Language':'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4', 'Cache-Control':'max-age=0', 'Connection':'keep-alive', 'Proxy-Connection':'keep-alive', #'Cache-Control':'no-cache', 'Connection':'close',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
'Accept-Encoding':'gzip,deflate,sdch','Accept':'*/*'}
ses = requests.session()
ses.mount(url, DESAdapter())
response = ses.get(url, timeout=tmoval, headers = hdr, proxies=proxies)
[EDIT2]
When your HTTPS url contains any uppercase letter, the patch would fail to work. You need to reverse them to lowercase. Something unknown in the stack requests/urllib3/openssl cause the patch logic being restored to its default TLS1.2 fashion.
[EDIT3]
from http://docs.python-requests.org/en/master/user/advanced/
The mount call registers a specific instance of a Transport Adapter to a prefix. Once mounted, any HTTP request made using that session whose URL starts with the given prefix will use the given Transport Adapter.
So, to make all HTTPS requests include those redirected by the server afterwards to use the new adapter, must change this line to:
ses.mount('https://', DESAdapter())
Somehow it fixed the uppercase problem mentioned above.
Related
Urllib2 through proxy and trust untrusted SSL certificates
I've read the various posts such as: urllib2 won't use my proxy https://stackoverflow.com/a/11130306/413180 python ignore certificate validation urllib2 https://stackoverflow.com/a/1450154/413180 etc, etc, however, nothing will work for me. I want to proxy everything through my intercepting proxy (Burp Suite Pro), so I can see and edit the requests my Python 2 script makes, but I don't want it to error on the Burp CA cert being invalid. My code: proxy={'http': '127.0.0.1:8081', 'https': '127.0.0.1:8081'} proxy_handler = urllib2.ProxyHandler(proxy) opener = urllib2.build_opener(proxy_handler) context = ssl._create_unverified_context() opener.context = context urllib2.install_opener(opener) url_request = urllib2.Request("https://example.com") response = opener.open(url_request) Also tried import ssl context = ssl.create_default_context() context.check_hostname = False context.verify_mode = ssl.CERT_NONE I've also tried copying the Burp cacert.der file to /etc/pki/ca-trust/source/anchors/ and running update-ca-trust. All give the error urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)>
Getting [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590) error when trying to read a URL through FancyURLopener
I am trying to execute below code (reading content from a html) using FancyURLopener. The code was working fine for last 2 months or so , but now it has started to throw the error : IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590) When I try to run it locally , it works like a charm. from urllib import urlopen from urllib import FancyURLopener from bs4 import BeautifulSoup import requests doc_name = "XYZ" class MyOpener(FancyURLopener): version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' mopen = MyOpener() def extract_count_from_url(url, tag_name, tag_type, html_tag) : html = mopen.open(url).read() soup = BeautifulSoup(html, "html.parser") I have searched it over on stackoverflow and google. The answers I am getting is mostly to use urllib2 / urllib libraries and use user agents + set the context to ssl.CERT_NONE (How do I disable the ssl check in python 3.x?) But I guess same is not applicable when I use FancyURLopener , as when I set context in the open() method along with url , it throws invalid arguments error. python version = Python 2.7.12 Any leads would be helpful. Thanks in advance.
I was able to figure out a workaround. Have added the below part in the code and it bypassed the security. import ssl ssl._create_default_https_context = ssl._create_unverified_context
web scraper with HTTP Error 503: Service Unavailable
I am trying to build a scraper, but I keep getting the 503 blocking error. I can still access the website manually, so my IP address hasn't been blocked. I keep switching user agents and still can't get my code to run all the way through. Sometimes I get up to 15, sometimes I don't get any, but it always fails eventually. I have no doubt that I'm doing something wrong in my code. I did shave it down to fit, though, so please keep that in mind. How do I fix this without using third parties? import requests import urllib2 from urllib2 import urlopen import random from contextlib import closing from bs4 import BeautifulSoup import ssl import parser import time from time import sleep def Parser(urls): randomint = random.randint(0, 2) randomtime = random.randint(5, 30) url = "https://www.website.com" user_agents = [ "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)", "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)", "Opera/9.80 (Windows NT 6.1; U; cs) Presto/2.2.15 Version/10.00" ] index = 0 opener = urllib2.build_opener() req = opener.addheaders = [('User-agent', user_agents[randomint])] def ReadUPC(): UPCList = [ 'upc', 'upc2', 'upc3', 'upc4', 'etc.' ] extracted_data = [] for i in UPCList: urls = "https://www.website.com" + i randomtime = random.randint(5, 30) Soup = BeautifulSoup(urlopen(urls), "lxml") price = Soup.find("span", { "class": "a-size-base a-color-price s-price a-text-bold"}) sleep(randomtime) randomt = random.randint(5, 15) print "ref url:", urls sleep(randomt) print "Our price:",price sleep(randomtime) if __name__ == "__main__": ReadUPC() index = index + 1 sleep(10) 554 class HTTPDefaultErrorHandler(BaseHandler): 555 def http_error_default(self, req, fp, code, msg, hdrs): 556 raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) 557 558 class HTTPRedirectHandler(BaseHandler): HTTPError: HTTP Error 503: Service Unavailable
What website you are scraping? most websites uses cookies to recognize the user as well. Please enable cookies in your code. Also open that link in browser and along with Firebug and see Headers being sent to server by your browser while making request. and then try to fake all those headers. PS: In my view, sending random user-agent strings from SAME IP wont make any difference, unless you are rotating IPs.
Behave like a normal human being using a browser. That website seems to be designed to analyze your behaviour and sees that you're a scraper, and wants to block you; in the easiest case, a minimal JavaScript that changes link URLs on the fly would be enough to disable "dumb" scrapers. There's elegant ways to solve this dilemma, for example by instrumenting a browser, but that won't happen without external tools.
How to ignore SSL certificate validation in pysimplesoap
I'm trying to access a web service that uses a self-generated certificate using pysimplesoap and python 2.7.9 from pysimplesoap.client import SoapClient import base64 username = 'webuser' password = 'webpassword' base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '') # real address / login removed client = SoapClient(wsdl='https://url:port/webservice.asmx?WSDL', http_headers={'Authorization': 'Basic %s'%base64string}, sessions=True, cacert=None) response = client.StatusInfo(... removed ...) print(response) Trying this throws the error message urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> There are tips on how to bypass the problem by fixing urllib2, but is there a simpler way that allows me to tell pysimplesoap to ignore all SSL certificate client side errors. I'm using Windows7 and plan to port the code to a Raspian/Debian Linux, so a solution should not depend on the operating system.
Answering my own question here, adding the 1st & 3rd line will ignore certification verification import ssl from pysimplesoap.client import SoapClient ssl._create_default_https_context = ssl._create_unverified_context There's a longer discussion about this here where you can learn why this is not a good idea...
Connect to secure web servive with pfx certificate using python
Hi i need help to get info from web service on secure site with pfx certificate with password. i tried more then one example.. code example: import requests wsdl_url = 'blabla' requests.get(wsdl_url, cert='cert.pfx', verify=True) other example: import urllib3 import certifi wsdl_url = 'blabla' http = urllib3.PoolManager( cert_reqs='CERT_REQUIRED', # Force certificate check. ca_certs=certifi.where() # Path to the Certifi bundle. ) certifi.where() # You're ready to make verified HTTPS requests. try: r = http.request('GET', wsdl_url) print r except urllib3.exceptions.SSLError as e: print "wrong" # Handle incorrect certificate error. Error type: connection aborted an existing connection was forcibly closed by the remote host help please