Downloading data from website - regex

I use the following code for downloading two files in a folder from a website.
I want to download some files that contain "MOD09GA.A2008077.h22v05.005.2008080122814.hdf" and "MOD09GA.A2008077.h23v05.005.2008080122921.hdf" in the page. But I don't know how to select these files. The code below download all the files, but I only need two of them.
Does anyone have any ideas?
URL = 'http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2008.03.17/';
% Local path on your machine
localPath = 'E:/myfolder/';
% Read html contents and parse file names with ending *.hdf
urlContents = urlread(URL);
ret = regexp(urlContents, '"\S+.hdf.xml"', 'match');
% Loop over all files and download them
for k=1:length(ret)
filename = ret{k}(2:end-1);
filepathOnline = strcat(URL, filename);
filepathLocal = fullfile(localPath, filename);
urlwrite(filepathOnline, filepathLocal);
end

Try the regexp with tokens instead:
localPath = 'E:/myfolder/';
urlContents = 'aaaa "MOD09GA.A2008077.h22v05.005.2008080122814.hdf.xml" and "MOD09GA.A2008077.h23v05.005.2008080122921.hdf.xml" aaaaa';
ret = regexp(urlContents , '"(\S+)(?:\.\d+){2}(\.hdf\.xml)"', 'tokens');
%// Loop over each file name
for k=1:length(ret)
filename = [ret{k}{:}];
filepathLocal = fullfile(localPath, filename)
end

Related

ZIP file without original directory structure in python

I am having a situation in ensuring that when I create a zip file it does not have the whole directory of the file when it is unzipped.
Having done some research there is a lot of content about using arcname in zip.write, however any solution I try results in the whole server being zipped!
I have tried adding arcname = os.path.basename(file) and other possible solutions with no luck.
This is my code below:
all_order_files = glob.glob("/directory/"+str(order_submission.id)+"-*")
zip = zipfile.ZipFile("/directory/" + str(order_submission.id) + '-Order-Summary.zip', 'w')
for file in all_order_files:
zip.write(file)
zip.close()
After reading this answer: Create .zip in Python?
I adapted the code to read the following which solved the issue for me.
all_order_files = glob.glob("/directory/"+str(order_submission.id)+"-*")
zip = zipfile.ZipFile("/directory/" + str(order_submission.id) + '-Order-Summary.zip', 'w')
path = "/directory/"
for file in all_order_files:
file_name = file.split('/')[-1]
absname = os.path.abspath(os.path.join(path, file_name))
arcname = absname[len(path) + 1:]
zip.write(absname, arcname)
zip.close()
Noting the extra argument provided to the write function that changes the directory structure when the zip file is unzipped.

Using filters in pyGtk

I am writing a script to display a GUI in which certain files can be chosen. I am using pyGtk and as of now, my code can display all the zip files. I want to add another filter to display only the zip files with the latest date.
Below is my function that displays only zip files.
def open_file( self, w, data=None):
d = gtk.FileChooserDialog( title="Select a file",
parent=self.window,
action=gtk.FILE_CHOOSER_ACTION_OPEN,
buttons=("OK",True,"Cancel",False)
)
#create filters
filter1 = gtk.FileFilter()
filter1.set_name("All files")
filter1.add_pattern("*")
d.add_filter(filter1)
filter2 = gtk.FileFilter()
filter2.set_name("Zip files")
filter2.add_pattern("*.zip")
d.add_filter(filter2)
ok = d.run()
if ok:
import os
fullname = d.get_filename()
dirname, fname = os.path.split( fullname)
size = "%d bytes" % os.path.getsize( fullname)
text = self.label_template % (fname, dirname, size)
else:
text = self.label_template % ("","","")
self.label.set_label( text)
d.destroy()
Is there a way I can choose a filter to display only the latest zip files in a each folder?
Thanks in advance for your help!
Instead of using filter2.add_pattern("*.zip") use filter2.add_pattern("filename")where filename is the name of the file with the latest date. You can write a function that would return a list with file names of the latest zip file.

Extract zipfiles and gzfiles from a zip folder

I can extract a zip folder containing several compressed files inside it but I don't know how to extract the zip and gz files inside it without repeating the same procedure two times?
import zipfile,fnmatch,os
rootPath = zipDataDirectory
rootPath2 = workingDirectory
pattern = '*.zip'
pattern2 = '*.gz'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)
I tried the following code that is not working
extensionZip = "*.zip"
extensionGz = "*.gz"
for item in os.listdir(workingDirectory):
if item.endswith(extensionZip):
zipfile.ZipFile(item).extractall
else:
gzip.GzipFile.extract(item)

Concat a String in python

this is are my files
2015125_0r89_PEO.txt
2015125_0r89_PED.txt
2015125_0r89_PEN.txt
2015126_0r89_PEO.txt
2015126_0r89_PED.txt
2015126_0r89_PEN.txt
2015127_0r89_PEO.txt
2015127_0r89_PED.txt
2015127_0r89_PEN.txt
and I want to change to this:
US.CAR.PEO.D.2015.125.txt
US.CAR.PED.D.2015.125.txt
US.CAR.PEN.D.2015.125.txt
US.CAR.PEO.D.2015.126.txt
US.CAR.PED.D.2015.126.txt
US.CAR.PEN.D.2015.126.txt
US.CAR.PEO.D.2015.127.txt
US.CAR.PED.D.2015.127.txt
US.CAR.PEN.D.2015.127.txt
this is my code so far,
import os
paths = (os.path.join(root, filename)
for root, _, filenames in os.walk('C:\\data\\MAX\\') #location files
for filename in filenames)
for path in paths:
a = path.split("_")
b = a[2].split(".")
c = "US.CAR."+ b[0] + ".D." + a[0]
print c
when I run the script it's no make any error, but not change the name of the files .txt which it is what it should supposed to do
any help?
The way you do it by first getting the path and then manipulating it will get bad results, in this case is best first get the name of the file, make the changes to it and then change the name of the file itself, like this
for root,_,filenames in os.walk('C:\\data\\MAX\\'):
for name in filenames:
print "original:", name
a = name.split("_")
b = a[2].split(".")
new = "US.CAR.{}.D.{}.{}".format(b[0],a[0],b[1]) #don't forget the file extention
print "new",new
os.rename( os.path.join(root,name), os.path.join(root,new) )
string concatenation is more inefficient, the best way is using string formating.

ZIP Archive Created Within ZIP Archive

I recently wrote a python script to select certain files within a directory and save them to a new archive within that directory. The script works with the exception that it creates a duplicate archive within the new archive. I think it has something to do with the arcname I used and the loop but I'm really not sure. As I'm sure is obvious by looking at my code I am a beginner so I am sure there is plenty of room for improvement here. Any ideas as to where the problem is? Also if you have any suggestions for improving the code I'm all ears.
import os,arcpy,zipfile
inputfc = arcpy.GetParameterAsText(0) # User Inputs Feature Class Path
desc = arcpy.Describe(inputfc)
fcname = desc.basename
zname = fcname + ".zip"
gpath = os.path.dirname(inputfc)
zpath = os.path.join(gpath,zname)
zfile = zipfile.ZipFile(zpath, "w")
for f in os.listdir(gpath):
fpath = os.path.join(gpath, f)
if f.startswith(fcname):
zfile.write(fpath,f,compress_type = zipfile.ZIP_DEFLATED)
zfile.close()
Edit: After aruisdante answered my question I decided to just change the zname variable to
zname = "zip" + fcname + ".zip" #ugly but it worked thanks
This:
zfile = zipfile.ZipFile(zpath, "w")
Creates a new Zip file at zpath
for f in os.listdir(gpath):
Iterates through all of the files at gpath. Since gpath is also the root of zpath, then the zip file you just created will be one of the files in gpath. So it gets included in the archive. You will need to exclude it:
for f in (filename for filename in os.listdir(gpath) if filename != zname):