urllib module errors launching script with python 2.7.13 - python-2.7

i want to launch a script to get a file by url with urllib but I always get a sequence of errors:
I checked the documentation and says that something is deprecated but i can't find the right syntax.
import urllib
fhand = urllib.urlopen('http://www-dr-chuck.com/page1.htm')
for line in fhand:
print line.strip()

Your URL is wrong.
http://www-dr-chuck.com/page1.htm
I think it should be
http://www.dr-chuck.com/page1.htm
There should be a dot . after www not a dash -.

Related

Regex dealing with Kanji characters in Python

so for this web-scraping project i'm working on, I've been trying to separate some results from results.
basically if the title contains 指定されたページが見つかりません , i'll want to copy the url and write it to one fail.csv file. Anything else i'll want to copy the url and write it to sucess.csv
html = 'www.abc.com'
url = BeautifulSoup(html,'html.parser').title.string
pattern = re.compile(r' 指定されたページが見つかりません')
if pattern.finditer(url):
with open('fail.csv','w') as f:
cw=csv.writer
cw.writerow([url])
else:
move on, run some other codes and write to sucess.csv
However it seems that regex isn't recognising 指定されたページが見つかりません
Am i doing something wrong here or missing something here?
Thanks
Try
sudo pip3 install requests
sudo pip3 install beautifulsoup4
sudo pip3 install re
and under python3
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('https://corp.rakuten.co.jp/careers/life/')
r.encoding='utf-8'
pattern = re.compile(r' 指定されたページが見つかりません')
url = BeautifulSoup(r.text,'html.parser').title.string
pattern.findall(url)

Shebang command to call script from existing script - Python

I am running a python script on my raspberry pi, at the end of which I want to call a second python script in the same directory. I call it using the os.system() command as shown in the code snippet below but get import errors. I understand this is because the system interprets the script name as a shell command and needs to be told to run it using python, using the shebang line at the beginning of my second script.
#!/usr/bin/env python
However doing so does not solve the errors
Here is the ending snippet from the first script:
# Time to Predict E
end3 = time.time()
prediction_time = end3-start3
print ("\nPrediction time: ", prediction_time, "seconds")
i = i+1
print (i)
script = '/home/pi/piNN/exampleScript.py'
os.system('"' + script + '"')
and here is the beginning of my second script:
'#!usr/bin/env python'
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
#from picamera import PiCamera
import argparse
import sys
import time
import numpy as np
import tensorflow as tf
import PIL.Image as Image
Any help is greatly appreciated :)
Since you have not posted the actual errors that you get when you run your code, this is my best guess. First, ensure that exampleScript.py is executable:
chmod +x /home/pi/piNN/exampleScript.py
Second, add a missing leading slash to the shebang in exampleScript.py, i.e. change
'#!usr/bin/env python'
to
'#!/usr/bin/env python'
The setup that you have here is not ideal.
Consider simply importing your other script (make sure they are in the same directory). Importing it will result in the execution of all executable python code inside the script that is not wrapped in if __name__ == "__main__":. While on the topic, should you need to safeguard some code from being executed, place it in there.
I have 2 python file a.py and b.py and I set execute permission for b.py with.
chmod a+x b.py
Below is my sample:
a.py
#!/usr/bin/python
print 'Script a'
import os
script = './b.py'
os.system('"' + script + '"')
b.py
#!/usr/bin/python
print 'Script b'
Execute "python a.py", the result is:
Script a
Script b

Problems using Colorama on Python 2.7

I'm learning to use colorama in Python, so I installed it and I'm able to import the module with no problems from the Primary Prompt.
>>> import colorama
>>> from colorama import *
>>> print(Fore.BLUE + 'BLUE TEXT')
BLUE TEXT
Now, if I create a small piece of code like this:
#!/usr/bin/env python2.7
from colorama import *
print(Fore.BLUE + 'BLUE TEXT')
I get the following message:
File "colorama_Test.py", line 3, in <module>
from colorama import *
File "/home/olg32/Python/colorama_Test.py", line 5, in <module>
print(Fore.BLUE + 'BLUE TEXT')
NameError: name 'Fore' is not defined
Which tells me that the module is not being found. But as mentioned it was installed and tested successfully from the Primary Prompt. Could it be a path definition issue or something like that? This is the current directory where the module is installed:
usr/local/lib/python2.7/dist-packages/colorama-0.3.7-py2.7.egg
Does this path needs to be defined somewhere? Sorry I'm new on Python.
Any help would be appreciated.
Thank you.
Hopefully you have worked out the answer by now but have you tried specifying Fore?
When I use the colorama module I start with this:
import os, colorama
from colorama import Fore,Style,Back #specifying all 3 types
os.system("mode con: cols=120 lines=30") #sometimes colorama doesnt work
#when double clicking a python app so I use this to "prompt" command line
#and then it works fine colorama.init() should work too
Example code:
import os, colorama
from colorama import Fore,Style,Back
os.system("mode con: cols=120 lines=30")
print(Fore.RED + 'some red text')
print(Back.GREEN + 'and with a green background')
print(Style.DIM + 'and in dim text')
print(Style.RESET_ALL)
print('back to normal now')
If this doesnt work for you let me know :)

How do I import files from other directory in python 2.7

I have been experimenting with python by creating some programs .The thing is, I have no idea how to import something OUT of the default python directory.
OK
So I did some heavy research and the conclusion is
if u want to access a file saved at different location
use
f = open('E:/somedir/somefile.txt', 'r')
r = f.read()
NOTE: Dont use '\' that were I went wrong.Our system addresses uses '\' So be careful
If you need to just read in a file and not import a module the documentation covers this extensively.
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
Specifically for Windows file systems you will need to do one of the following:
1.) Use forwardslashes vs backslashes. This should work with most OSes.
f = open("c:/somedir/somefile.txt", "r")
2.) Use a raw string.
f = open(r"c:\somedir\somefile.txt", "r")
3.) Escape the backslashes.
f = open("c:\\somedir\\somefile.txt", "r")
If you need to import a module to use in your program from outside your programs directory you can use the below information.
Python looks in the sys.path to see if the module exists there and if so does the import. If the path where you files/modules are located is not in the sys.path, Python will raise an ImportError. You can update the path programmatically by using the sys module.
import sys
dir = "path to mymodule"
if dir not in sys.path:
sys.path.append(dir)
import mymodule
You can check the current sys.path by using:
print(sys.path)
Example:
>>> print(sys.path)
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages']
>>> sys.path.append("/Users/ddrummond/pymodules")
>>> print(sys.path)
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages', '/Users/ddrummond/pymodules']
>>>
You can see that sys.path now contains '/Users/ddrummond/pymodules'.

simple web crawler

i wrote below program in python for very simple web crawler, but when i run it it return me
'NoneType' object is not callable' , could you please help me?
import BeautifulSoup
import urllib2
def union(p,q):
for e in q:
if e not in p:
p.append(e)
def crawler(SeedUrl):
tocrawl=[SeedUrl]
crawled=[]
while tocrawl:
page=tocrawl.pop()
pagesource=urllib2.urlopen(page)
s=pagesource.read()
soup=BeautifulSoup.BeautifulSoup(s)
links=soup('a')
if page not in crawled:
union(tocrawl,links)
crawled.append(page)
return crawled
crawler('http://www.princeton.edu/main/')
[UPDATE] Here is the complete project code
https://bitbucket.org/deshan/simple-web-crawler
[ANWSER]
soup('a') returns the complete html tag.
Buy Music Now
so the urlopen gives the error
'NoneType' object is not callable'. you need extract the only the url/href.
links=soup.findAll('a',href=True)
for l in links:
print(l['href'])
You need to validate the url too.refer to following anwsers
How do you validate a URL with a regular expression in Python?
Python - How to validate a url in python ? (Malformed or not)
Again i would like to suggest you to use python sets instead Arrays.you can easily add,ommit duplicate urls.
http://docs.python.org/2/library/sets.html
Try the following code:
import re
import httplib
import urllib2
from urlparse import urlparse
import BeautifulSoup
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
r'localhost|' #localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
def isValidUrl(url):
if regex.match(url) is not None:
return True;
return False
def crawler(SeedUrl):
tocrawl=[SeedUrl]
crawled=[]
while tocrawl:
page=tocrawl.pop()
print 'Crawled:'+page
pagesource=urllib2.urlopen(page)
s=pagesource.read()
soup=BeautifulSoup.BeautifulSoup(s)
links=soup.findAll('a',href=True)
if page not in crawled:
for l in links:
if isValidUrl(l['href']):
tocrawl.append(l['href'])
crawled.append(page)
return crawled
crawler('http://www.princeton.edu/main/')