Find and rename all files in directory matching a certain pattern - regex

I'm attempting to write a program that will loop through every subfolder, find and rename all the files that match a given pattern in the filename. The files are all .jpg files and have the following patter:
[0-9][0-9][0-9]_UsersfirstnameUserslastname[0-9][0-9][0-9].jpg
so for instance one folders would have the following:
452_AlexBobenko002.jpg
452_AlexBobenko003.jpg
452_AlexBobenko007.jpg
Then it would go to another folder where the following files exists:
834_CatDonald001.jpg
...
834_CatDonlad126.jpg
I would like to rename the files so that there would be an underscore after the last letter and before the last set of 3 digits. So the patter would go from:
[0-9][0-9][0-9]_UsersfirstnameUserslastname[0-9][0-9][0-9].jpg
to
[0-9][0-9][0-9]_UsersfirstnameUserslastname_[0-9][0-9][0-9].jpg
and from the above example I would have:
452_AlexBobenko002.jpg --> 452_AlexBobenko_002.jpg
452_AlexBobenko003.jpg --> 452_AlexBobenko_003.jpg
452_AlexBobenko007.jpg --> 452_AlexBobenko_007.jpg
and
834_CatDonald001.jpg --> 834_CatDonald_001.jpg
...
834_CatDonlad126.jpg --> 834_CatDonald_126.jpg
So far I have been able to locate the desired files with the following:
path = mydir
folders = [filename for filename in os.listdir(path) if filename.startswith('EMP-')]
subfolders = [[] for i in range(len(folders))]
# Will populate the empty sublist of subfolders with the contents of each distinct folder
for i in range(len(folders)):
subfolders[i] = [subfolder for subfolder in os.listdir(path +'\\%s' %folders[i])]
for z_1 in range(len(folders)):
for z_2 in range(len(subfolders[z_1])):
if os.path.isdir(path + '\\%s\\%s' % (folders[z_1], subfolders[z_1][z_2])) == True:
for file in glob.glob(path + '\\%s\\%s\\[0-9][0-9][0-9]_*.jpg' % (folders[z_1], subfolders[z_1][z_2])):
#rename(file)
I really have no clue how to rename them

Related

Traversing multiple folders for searching the same file in multiple foders in python

search the same file in multiple folders
I have tried with os.walk(path) but I am not getting the nested folders traversing
for current_root, folders, file_names in os.walk(self.path, topdown=True):
for i in folders:
print i
for filename in file_names:
count+= 1
file_path = os.path.join(current_root + '\\' + filename)
#print file_path
self.location_dictionary[file_path] = filename
in my code, it will print all folders but it will not enter to the nested folders recursively
ex: I have subdir,subdir1,subdir2 and in subdir I have another dir called abc
in subdir and abc both contain same file name I want to read that file
os.walk does not work that way.
for each current_root it traverses, it provides the list of directories and files directly under it.
You're nesting the loops, which does ... well I don't know...
Here you don't need the folder (so just mute the argument). current_root already contains that info for your files:
for current_root, _, file_names in os.walk(self.path, topdown=True):
for filename in file_names:
count+= 1
file_path = os.path.join(current_root,filename)
#print file_path
self.location_dictionary[file_path] = filename
aside: creating a dictionary with full file as key and filename as value looks, well, not what you want (the same information could be stored in a set or list and os.path.basename could be used to compute the filename. Maybe it's reverse (filename => full path), provided that there are no duplicate filenames.

Python shutil file move in os walk for loop

The code below searches within a directory for any PDFs and for each one it finds it moves into the corresponding folder which has '_folder' appended.
Could it be expressed in simpler terms? It's practically unreadable. Also if it can't find the folder, it destroys the PDF!
import os
import shutil
for root, dirs, files in os.walk(folder_path_variable):
for file1 in files:
if file1.endswith('.pdf') and not file1.startswith('.'):
filenamepath = os.path.join(root, file1)
name_of_file = file1.split('-')[0]
folderDest = filenamepath.split('/')[:9]
folderDest = '/'.join(folderDest)
folderDest = folderDest + '/' + name_of_file + '_folder'
shutil.move(filenamepath2, folderDest)
Really I want to traverse the same directory after constructing the variable name_of_file and if that variable is in a folder name, it performs the move. However I came across issues trying to nest another for loop...
I would try something like this:
for root, dirs, files in os.walk(folder_path_variable):
for filename in files:
if filename.endswith('.pdf') and not filename.startswith('.'):
filepath = os.path.join(root, filename)
filename_prefix = filename.split('-')[0]
dest_dir = os.path.join(root, filename_prefix + '_folder')
if not os.path.isdir(dest_dir):
os.mkdir(dest_dir)
os.rename(filepath, os.path.join(dest_dir, filename))
The answer by John Zwinck is correct, except it contains a bug where if the destination folder already exists, a folder within that folder is created and the pdf is moved to that location. I have fixed this by adding a 'break' statement within the inner for loop (for filename in files).
The code below now executes correctly. Looks for folder named as the pdf's first few characters (taking the prefix split at '-') with '_folder' at the tail, if it exists the pdf is moved into it. If it doesn't, one is created with the prefix name and '_folder' and pdf is moved into it.
for root, dirs, files in os.walk(folder_path_variable):
for filename in files:
if filename.endswith('.pdf') and not filename.startswith('.'):
filepath = os.path.join(root, filename)
filename_prefix = filename.split('-')[0]
dest_dir = os.path.join(root, filename_prefix + '_folder')
if not os.path.isdir(dest_dir):
os.mkdir(dest_dir)
os.rename(filepath, os.path.join(dest_dir, filename))
break

Extract zipfiles and gzfiles from a zip folder

I can extract a zip folder containing several compressed files inside it but I don't know how to extract the zip and gz files inside it without repeating the same procedure two times?
import zipfile,fnmatch,os
rootPath = zipDataDirectory
rootPath2 = workingDirectory
pattern = '*.zip'
pattern2 = '*.gz'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)
I tried the following code that is not working
extensionZip = "*.zip"
extensionGz = "*.gz"
for item in os.listdir(workingDirectory):
if item.endswith(extensionZip):
zipfile.ZipFile(item).extractall
else:
gzip.GzipFile.extract(item)

Python - Counting the number of files and folders in a directory

I've got a python script that deletes an entire directory and its subfolders, and I'd like to print out the number of files and folders removed. Currently, I have found some code from a different question posed 2010, but the answer I receive back is 16... If I right-click on the the folder it states that theres 152 files, 72 folders...
The code I currently have for checking the directory;
import os, getpass
user = getpass.getuser()
copyof = 'Copy of ' + user
directory = "C:/Documents and Settings/" + user
print len([item for item in os.listdir(directory)])
How can I extend this to show the same number of files and folders that there actually are?
To perform recursive search you may use os.walk.
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at
directory top (including top itself), it yields a 3-tuple (dirpath,
dirnames, filenames).
Sample usage:
import os
dir_count = 0
file_count = 0
for _, dirs, files in os.walk(dir_to_list_recursively):
dir_count += len(dirs)
file_count += len(files)
I was able to solve this issue by using the following code by octoback (copied directly);
import os
cpt = sum([len(files) for r, d, files in os.walk("G:\CS\PYTHONPROJECTS")])

using os.walk cannot open the file from the list

My problem is to read '.csv' files in catalogs and do some calculations on them.
I have calculations working but my for loop seem not to work as I want to.
d = 'F:\MArcin\Experiments\csvCollection\'
for dirname, dirs, files in os.walk(d):
for i in files:
if i.endswith('.csv'):
data1 = pd.read_csv(i, sep=",")
data = data1['x'][:, np.newaxis]
target = data1['y']
The error Iam getting is:
IOError: File 1.csv does not exist
files is list of all '.csv' files inside dirname
i is str of size 1 and contains 1.csv (that is first of the files in catalog)
Any ideas why this is not working?
Thanks for any help.
Because 1.csv is somewhere on the filesystem and when you call read_csv() it opens file relative to current directory.
Just open it using absolute path:
data1 = pd.read_csv(os.path.join(dirname, i), sep=",")
dirname in os.walk represents actual directory where file 1.csv is located.