I am trying to code in python to download few data , the code is working for one structure but not for other, its giving me this error which i don't understand. I have written code on sublime text 3 and running it on DOS. Python version using is 2.7.11.
from bs4 import BeautifulSoup
import urllib
import re
url= raw_input("http://physics.iitd.ac.in/content/list-faculty-members")
html=urllib.urlopen(url).read()
soup=BeautifulSoup(html)
table = soup.find("table", attrs={"border":"0","width":"100%","cellpadding":"10"})
head=soup.find("h2",attrs={"class":"title style3"})
ready= table.find_all("tr")
header=head.find("big").find("strong")
datasets=[]
quest=[]
s=[]
test=header.get_text()
quest.append(test)
for b in ready:
x=[td.get_text() for td in b.find_all("td")]
dataset =[strong.get_text() for strong in b.find("td").find("a").find_all("strong")]
datasets.append(dataset)
quest.append(x)
print quest
The fact that it states cannot find the file specified: '' means that you're trying to open a file specified by an empty string!
It's a little hard to help much further since we don't have the code. The code you have included cannot be the code that generated that screenshot since the screenshot would have included a prompt as shown (the argument to the raw_input() call).
Clarifying that point, if that string you appear to have entered was actually entered, there would be no problem.
Calling urlopen() will in turn call FancyUrlOpener.open() and, being a descendant of UrlOpener, that's the function that receives control.
That function will intelligently select the function to use based on the scheme given.
The fact that it's choosing the file scheme rather than the HTTP one, and the fact that the exception complains about the file being an empty string, means that you are not passing in what you think you are.
The error message you're seeing, and the stack trace, can only occur if the following line fails (see open_local_file() here):
stats = os.stat(localname)
The stat call is only for local files, not URLs.
So you should be concentrating your effort: why is the string empty?
The most likely theory is that the code you've given the screenshot for had a different URL in the raw_input prompt and so thats what we're seeing as the prompt in the screenshot.
That would mean you simply pressed ENTER, perhaps thinking it had helpfully provided that URL as a default. That ENTER would then be taken as an emty string which would explain both the scheme selection and the empty string being used as a file name.
I have copy the url to test:
url = "http://www.che.iitb.ac.in/online/people/viewstudents?filter0=**ALL**&op1=&filter1=2010"
urllib.urlopen(url)
In fact, the url can be correctly parsed as "http", but your error msg tells us that your url are parsed as "file", so it is necessary for you to show us your really url or code.
My python version is 2.7.5.
Related
I use opencart version 2.1.0.1
Everytime I click admin > sales > order, it will pop up "error undefined." By closing that popup window, I can still edit order but cannot delete order (no response).
In my log, there is:
PHP Notice: Undefined variable: order_id in
/var/www/html/opencart2101/system/storage/modification/admin/view/template/sale/order_list.tpl on line 821
The line 821 is:
url: 'index.php?route=extension/openbay/addorderinfo&token=<?php echo $token; ?>&order_id=<?php echo $order_id; ?>&status_id=' + status_id,
However, I haven't installed any openbay related module. Also, line 821 is inside <!-- --> mark. It should have no effect.
Help!
Although this is now an older version of opencart, I still see this being reported a lot around and about.
The problem occurs due to the store front adding the http url rather than the https url to the order. So firstly you need to fix that. If you dont want to read all of my explanation, you can just hit up the bold points :)
Either way BACKUP EVERYTHING actually not really, back up the file you are going to edit and backup your whole database.
open:
catalog/controller/checkout/confirm.php at around line 100
Find:
$order_data['store_url'] = HTTP_SERVER;
Change to:
$order_data['store_url'] = HTTPS_SERVER;
Now you will want to fix your database because for reasons I cannot fathom, the domain name is placed in the order along with the stores id. and when editing orders it is the usage of that directly within your admin order page that throws up the undefined notice. Basically the browser blocks the request because its trying to make an insecure request from a secure page.
Crack open phpmyadmin or whatever database tool you have on hand.
locate the table, default is oc_orders
Browsing the table, look for the column that contains your store url (i cant remember the name off hand, i think its just store_url but it will be obvious anyway. if you are multi store you will need to run the query for each
I am sure somebody can come up with a clever way to automatically convert just the http into https with a single use sql query on the one column, but this works for me.
Run SQL: adjust as appropriate
UPDATE `oc_orders` SET `store_url` = 'https://example.com' WHERE store_id = 0;
I'm trying to write a program which does a series of equations using two different sets of numbers. I have both sets of numbers saved as separate dictionaries. I would like to be able to choose which two dictionaries I use by inputting their names into the terminal using raw_input. Here's what I have written:
def open():
print "an opening text, description on what the program is actually doing"
mathstart()
def mathstart():
print "What is the first directory you wish you import?"
'directory1' = raw_input("> ")
import 'directory1'
print "your first directory is" + 'directory1'[name]
All of the directories are formatted with a name, so I can confirm I'm using the correct one, and then a bunch of different data.
When I run the program from a terminal, I get the following error:
$ python engine.py
File "engine.py", line 11
import 'directory11'
SyntaxError: invalid syntax
This is unsurprising, since I completely guessed as to how I would be able to call the directory using raw_input.
My real question is, would I be able to do this, or is it something that doesn't work? Since I really don't want to have to go in add the directories into the code each time I have to use it. I have 20+ different directories that I need to interchange. That's just a pain.
If I can't select a directory with raw_input, is there a way to select one without having to change the code each time?
A good discussion of dynamic module importing can be found here.
What you are trying to do could be accomplished using the __import__ function, which takes a string as an argument. For example:
def dynamic_import():
print "What is the first directory you wish you import?"
directory = raw_input("> ")
directory = __import__(directory)
return directory
my_module = dynamic_import()
Although it would likely be cleaner to pass the module as an argument to the function instead of using raw_input.
I've successfully managed to use win32 COM to grab details about the page numbers of a word document. However, when I try to use mydoc.ActiveWindow.Selection.Information(wdActiveEndPageNumber)
I know that the file has been read into memory properly because mydoc.Content.Text prints out all the content.
I get a "wdActiveEndPageNumber is Not Defined" error. Anyone know why this is happening and how to fix this? And is there some python documentation or am I stuck looking at VB and C# on msdn?
import win32com.client
word = win32com.client.Dispatch("Word.Application")
mydoc=word.DOcuments.Open("path:\\to\\file")
mydoc.ActiveWindow.Selection.Information(wdActiveEndPageNumber)
That's because wdActiveEndPageNumber is a constant that not defined by win32com, until you generate the COM type library from the application. Try this:
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants
word = EnsureDispatch("Word.Application")
mydoc = word.Documents.Open("path:\\to\\file")
mydoc.ActiveWindow.Selection.Information(constants.wdActiveEndPageNumber)
you could use the enumerated number. You can find this using the object browser in word. Just got into the vba editor, press f2 then enter wdActiveEndPageNumber as the search term,. When you select it in the results it wil show you its integer value. Then put that in your code.
I'm using Enthought Canopy with PyLab(64-bit). For my report I need to use Latex (XeLaTex) and the plots are done with matplotlib.
To have an first idea I just copied the second example from http://matplotlib.org/users/usetex.html and compiled it. It looks fine and I can save it as a normal png without problems. However if i try to save it as .eps or.ps it does not work and an error appears:
invalid literal for int() with base 10: "
Additionaly in the Pylab shell it shows:
'gswin32c' is not recognized as an internal or external command, operable program or batch file'.
If I save it as .pdf I have no problems except the text is all black instead of being red and blue. This is a problem because in my plots I have two axes and I need them colorized for better readability.
If I then try to delete some lines from the example given (all text) I still cannot save it as .eps nor .ps. I can't figure out the problem and all the other topics related to this have not given me an insight. So I really need your help because I can't use .png for my report.
Thank you in advance!!!
I finally managed to solve this problem. It might look weird but maybe other people can benefit from it.
The solution might depend upon the software you use. I use Enthought Canopy (Python) and MikTeX 2.9 under W8 64bit.
If you want to output .ps and .eps files with matplotlib using the 'text.usetex': True option then you will encounter the problem posted above.
Solution:
Download and install Ghostscript (32bit) from http://www.ghostscript.com/download/gsdnld.html.
Download ps2eps-1.68.zip from http://www.tm.uka.de/~bless/ps2eps. The proceeding is given in the manual, however I like to point out the part with the environment variables. In this last step you need to go to Control Panel --> System --> Advanced system settings. Then click on the header 'Advanced' and on the bottom of the window you see 'Environment Variables' on which you click. Then you use the 'New'-Button for User Variables for USERNAME. Then you type in as variable name 'ps2eps' and for variable value you type in the actual path where you have saved the ps2eps.pl file. In my case this is 'C:\Program Files (x86)\ps2eps\bin\'. You can check if you type 'ps2eps' in the command-window.
Download xpdfbin-win-3.03.zip from http://www.foolabs.com/xpdf/download.html. You only need the file 'pdftops.exe'. However I could not assign a path like in step 2. I solved this by putting the 'pdftops.exe' in the MikTeX 2.9 folder. The exact location for me was 'C:\Program Files\MiKTeX 2.9\miktex\bin\x64'.
I was then able to save figures as .ps and have no more any error messages. Remember to use the settings proposed on http://matplotlib.org/users/usetex.html under 'postscript options'.
In myself used the following settings:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import matplotlib as mpl
mpl.rc('font', **{'family':'serif', 'serif':['Computer Modern Roman'],
'monospace':['Computer Modern Typewriter']})
params = {'backend': 'ps',
'text.latex.preamble': [r"\usepackage{upgreek}",
r"\usepackage{siunitx}",
r"\usepackage{amsmath}",
r"\usepackage{amstext}",],
'axes.labelsize': 18,
#'axes.linewidth': 1,
#'text.fontsize':17,
'legend.fontsize': 10,
'xtick.labelsize': 13,
#'xtick.major.width' : 0.75,
'ytick.labelsize': 13,
'figure.figsize': [8.8,6.8],
#'figure.dpi': 120,
'text.usetex': True,
'axes.unicode_minus': True,
'ps.usedistiller' : 'xpdf'}
mpl.rcParams.update(params)
mpl.rcParams.update({'figure.autolayout':True})
(whereas many of the params are just for my own purpose later in the plots)
As a beginner I am not well informed about the dependence from the 'backend' used if you are running a script from your python console. I however used this without any --pylab settings in before and I do not know if one needs to switch the backend manually if he is working already in a console with a specific matplotlib backend.
I had the same problem and my problem was a font adjustment in the python code that is :
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
rc('text', usetex=True)
when I remove this iit works fine and now i can save eps.
So be sure that any shortest working example is working for you or not then check the font and other style edits in your code. This may help.
I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!
Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.