Unable to open a URL link with using urllib2 in chrome - python-2.7

I'm able to open the webpage by simply entering the URL link in my chrome browser
But when i move this URL link to below code, it will prompt me the error message:
CODE:
import urllib2
url = 'http://www.klse.info/companies/listed-companies/alphabet/A'
page = urllib2.urlopen(url).read()
ERROR:
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Anyone have idea with this?
I tried to change the URL link to other link address and it does work.
Is the website set restriction or anything i should take care for?

How to get rid from HTTP Error 403: Forbidden?
Please refer to the below code...
url = 'http://www.klse.info/companies/listed-companies/alphabet/A'
req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"})
page = urllib2.urlopen(req).read()

Related

How to connect to HTTPS through proxy using urllib2 (in Python)

If the website I'm trying to connect to via a proxy is unsecured (HTTP), then I'm able to connect, however if it's secured (HTTPS), then I can't.
The following code works:
import urllib2
proxy_support = urllib2.ProxyHandler({'http':'xxx.xxx.xxx.xx'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen('http://www.example.com').read()
However the code below does not work,
proxy_support = urllib2.ProxyHandler({'https':'xxx.xxx.xxx.xx'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen('https://www.example.com').read()
Instead I get the following traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1240, in https_open
context=self._context)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 61] Connection refused>
According to https://docs.python.org/2/library/urllib2.html:
Changed in version 2.7.9: cafile, capath, cadefault, and context were added.
This one allowed me to connect to my local HTTPS site that is using a self-signed SSL certificate:
html = urllib2.urlopen('http://www.example.com'),\
context=ssl._https_verify_certificates(False)
I noticed in your traceback the similarities with mine. The code, just like you posted, works on Ubuntu 14.04 (Python 2.7.6) but not in 16.04 (Python 2.7.13) with exception to the last one:
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open
context=self._context)
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>
I'm not sure if this work on your end.

raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

I try to read this url - http://malc0de.com/bl/BOOT in Python
import urllib2
threats = urllib2.urlopen("http://malc0de.com/bl/BOOT")
But I got this error:
Traceback (most recent call last):
File "C:\Android\android_workspace\pro2\test.py", line 2, in <module>
threats = urllib2.urlopen("http://malc0de.com/bl/BOOT")
File "C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
What can I do to fix it?
This is a HTTP error unrelated to python or urllib. It says that, for some reason, you are not allowed to view this particular page.
It seems to me that the site owner filters access by bots/crawlers, because I can open it in Firefox, but not via urllib. It might filter based on user agent, which may be changed, see Changing user agent on urllib2.urlopen, although this might be bad etiquette.

Python Traceback Error

Unable to create a response with this api.I am unable to call the function locu_search('new york'). I get the following error shown below. I am using Komodo as my IDE, this started when I created a new python shell.
import urllib2
import json
local_api = '0d5897aae41eeafbd62ad0815af15cc42b2ed7c0'
def locu_search(query):
api_key = local_api
url = 'https://api.locu.com/v1_0/venue/search/?api_key=' + api_key
locality = query.replace('','%20')
final_url = url + "&locality=" + locality + "&category=restaurant"
json_obj = urllib2.urlopen(final_url)
data = json.load(json_obj)
for item in data['objects']:
print item['name'],item['phone']
locu_search('new york')
The error is listed below:
**Traceback (most recent call last):
File "<console>", line 0, in <module>
File "<console>", line 0, in locu_search
File "c:\python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "c:\python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "c:\python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "c:\python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "c:\python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "c:\python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 400: BAD_REQUEST**
400 Bad Request should give you a headsup about the problem , this is basically due to a malformed request and I strongly suspect the culprit is in th line url = 'https://api.locu.com/v1_0/venue/search/?api_key=' + api_key , check if api_key token is invalid or no longer valid.

403 error during webpage request in python

I'm trying to pull the source from a url in terminal and I'm getting this: As far I can tell everything is in the right place.
response = urllib2.urlopen(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

Problem with Django's URL Field Test

Could someone please clarify me why this url http://www.nacolmeia.com.br/do/Home/oferta/EnER is not being accepted in a form generated from URLField's Django?
:)
Thanks
Are you hosting the site from the same server you are trying to validate it on? docs
Note that when you're using the
single-threaded development server,
validating a URL being served by the
same server will hang. This should not
be a problem for multithreaded
servers.
It doesn't look like its failing validation at the form level
>>> from django import forms
>>> f = forms.URLField()
>>> f.clean('http://www.nacolmeia.com.br/do/Home/oferta/EnER')
u'http://www.nacolmeia.com.br/do/Home/oferta/EnER'
>>> f.clean('sadfas')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/dev/.virtualenvs/thepidb/lib/python2.7/site-packages/django/forms/fields.py", line 171, in clean
self.run_validators(value)
File "/home/dev/.virtualenvs/thepidb/lib/python2.7/site-packages/django/forms/fields.py", line 160, in run_validators
raise ValidationError(errors)
ValidationError: [u'Enter a valid URL.']
>>>
If you don't need to validate that the website doesn't return a 404, in your models.py
url = models.URLField(verify_exists=False)
edit:
after some digging around in the django source code here and some messing around with the shell, I'm still not sure why the URL with caps is causing a redirect loop.
>>> from django.core.validators import URLValidator
>>> u = URLValidator(verify_exists=True)
>>> u.__call__('http://www.nacolmeia.com.br/do/Home/oferta/EnER')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/dev/.virtualenvs/thepidb/lib/python2.7/site-packages/django/core/validators.py", line 105, in __call__
raise broken_error
ValidationError: [u'This URL appears to be a broken link.']
>>> u.__call__('http://www.nacolmeia.com.br/do/home/oferta/ener')
>>>
The actual exception being raised is an HTTPError:
File "/usr/lib/python2.7/urllib2.py", line 606, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 398, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 511, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 430, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 370, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 606, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 398, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 511, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 430, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 370, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 596, in http_error_302
self.inf_msg + msg, headers, fp)
HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
>>>
here are some posts talking about the HTTPError: here and here
seems like it has something to do with cookies, but I'm not able to offer a good explanation, I'll leave that to some one else.
A workaround that might work if you don't want to turn off validation but don't care about the capitalization of your urls is to override the clean_field method of your forms.
def clean_your_url_field(self):
return self.cleaned_data['your_url_field'].lower()
I think I found the issue. When you open this URL:
http://www.nacolmeia.com.br/do/Home/oferta/EnER
...it re-directs to this URL:
http://www.nacolmeia.com.br/do/Home/oferta/EnER/piracicaba/a-pascoa-chegou-na-planet-chokolate!-50-off-para-1-caixa-com-16-bombons-recheados--1-pao-de-mel-recheado-ou-1-caixa-com-16-trufas-recheadas--1-pao-de-mel-recheado-de-rs-47.10-por-rs-23.55.
The first URL is fine, but the re-directed one is 247 characters long. This "shouldn't" be a problem, except that models.fields.URLField has max_length which defaults to 200 characters. So it fails validation because it's too long.
Instead, increase the max_length and it should work: models.URLField(max_length=255) For info on the longest URL possible, see this SO question. It's definitely longer than 200 characters though.
EDIT: It only re-directs to the second URL when setting a cookie! If you re-visit the same page again with an existing cookie, it just displays the shorter URL.
But what about the lowercase URL? It appears your web-server is case-sensitive regarding URLs, and the lowercase version:
http://www.nacolmeia.com.br/do/home/oferta/ener
...displays a generic error page. It doesn't re-direct to the 247 character URL. So that passes validation since the only thing models.URLField cares about is; does it load a webpage or not?