I have:
An excel file as A1:B2.
A folder with 200 jpeg files.
I'm trying to search the filename in the folder with the value in Column A and replace it with the value in Column B if found without changing the extensions of the files in the folder.
Here am stuck using various skiddies to do this but failed. Here's my code:
import os
import xlrd
path = r'c:\users\c_thv\desktop\x.xls'
#collect the files in fexceler
path1 = r'c:\users\c_thv\desktop'
data = []
for name in os.listdir(path1):
if os.path.isfile(os.path.join(path1, name)):
fileName, fileExtension = os.path.splitext(name)
if fileExtension == '.py':
data.append(fileName)
#print data
#collect the filenames for changing
book = xlrd.open_workbook(path)
sheet = book.sheet_by_index(0)
cell = sheet.cell(0,0)
cells = sheet.row_slice(rowx=0,start_colx=0,end_colx=2)
excel = []
#collect the workable data in an list
for cell in cells:
excel.append(cell)
#print excel
#compare list for matches
for i,j in enumerate(excel):
if j in data[:]:
os.rename(excel[i],data[i])
Try a print "Match found" after if j in data[:]: just to check if the condition is ever met. My guess is there will be no match because the list data is full on python filemanes (if fileExtension == '.py') and you are looking for jpeg files in the excel list.
Besides, old is not defined.
EDIT:
If I understand correctly, this will may help:
import os, xlrd
path = 'c:/users/c_thv/desktop' #path to jpg files
path1 = 'c:/users/c_thv/desktop/x.xls'
data =[] #list of jpg filenames in folder
#lets create a filenames list without the jpg extension
for name in os.listdir(path):
fileName, fileExtension = os.path.splitext(name)
if fileExtension =='.jpg':
data.append(fileName)
#lets create a list of old filenames in the excel column a
book = xlrd.open_workbook(path1)
sheet = book.sheet_by_index(0)
oldNames =[]
for row in range(sheet.nrows):
oldNames.append(sheet.cell_value(row,0))
#lets create a list with the new names in column b
newNames =[]
for row in range(sheet.nrows):
newNames.append(sheet.cell_value(row,1))
#now create a dictionary with the old name in a and the corresponding new name in b
fileNames = dict(zip(oldNames,newNames))
print fileNames
#lastly rename your jpg files
for f in data:
if f in fileNames.keys():
os.rename(path+'/'+f+'.jpg', path+'/'+fileNames[f]+'.jpg')
Related
I have 1000 of subdirectories (error1 - error1000) with three different csv files (rand.csv, run_error.csv, swe_error.csv). Each vsc has index row. I need to merge the csv files that have the same filename, so I end up with e.g. rand_merge.csv with index row and 1000 rows of data.
I followed Merge multiple csv files with same name in 10 different subdirectory, which gets me
KeyError: 'filename'
I can't figure out how to fix it, so any help is appreciated.
Thx
Update: Here's the exact code, which came from linked post above:
import pandas as pd
import glob
CONCAT_DIR = "./error/files_concat/"
# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("error/*/*")], columns=["fullpath"])
# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
# Join these into one DataFrame
files = files.join(files_split)
# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
concat_df = pd.concat(dfs) # Concat dataframes into one
concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
I found my mistake. I needed a "/" after rsplit, not "\"
files_split = files['fullpath'].str.rsplit("/", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
I need to parse through a directory of multiple excel files to find matches to a set of 500+ strings (that I currently have in a set).
If there is a match to one of the strings in an excel file, I need to pull that row out into a new file.
Please let me know if you can assist! Thank you in advance for the help!
The directory is called: All_Data
The set is from a list of strings in a file (MRN_file_path)
My code:
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
for root, dires, files in os.walk('path/All_Data'):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
with open(search_results_path, "w") as search_results:
if MRN in filepath:
search_results.write(line)
Your code doesn't actually read the .xlsx files. As far as I know, there isn't anything in native Python to read .xlsx files. However, you can check out openpyxl and see if that helps. Here's a solution which reads all the .xlsx files in the specified directory, and writes them into a single tab-delimited txt file.
import os
from openpyxl import load_workbook
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
outfile = open(search_results_path, "w")
for root, dires, files in os.walk(path):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
# load in the .xlsx workbook
wb = load_workbook(filename = filepath, read_only = True)
# assuming we select the worksheet which is active
ws = wb.active
# iterate through each row in the worksheet
for row in ws.rows:
# iterate over each cell
for cell in row:
if cell.value in MRN:
# create a temporary array with all the cell values in the matching row.
# the 'None' check is there to avoid errors when joining the array
# into a tab-delimited row
arr = [cell.value if cell.value is not None else "" for cell in row]
outfile.write("\t".join(arr) + "\n")
outfile.close()
If a tab-delimited output isn't what you're looking for, then you can adjust the second last line to whatever fits your needs.
As a part of my learning. After i successfully split with help, in my next step, wanted to know if i can split the names of files when the month name is found in the name of the file that matches with the name of the month given in this list below ---
Months=['January','February','March','April','May','June','July','August','September','October','November','December'].
When my file name is like this
1.Non IVR Entries Transactions December_16_2016_07_49_22 PM.txt
2.Denied_Calls_SMS_Sent_December_14_2016_05_33_41 PM.txt
Please note that the names of files is not same..i.e why i need to split it like
Non IVR Entries Transactions as one part and December_16_2016_07_49_22 PM as another.
import os
import os.path
import csv
path = 'C:\\Users\\akhilpriyatam.k\\Desktop\\tes'
text_files = [os.path.splitext(f)[0] for f in os.listdir(path)]
for v in text_files:
print (v[0:9])
print (v[10:])
os.chdir('C:\\Users\\akhilpriyatam.k\\Desktop\\tes')
with open('file.csv', 'wb') as csvfile:
thedatawriter = csv.writer(csvfile,delimiter=',')
for v in text_files:
s = (v[0:9])
t = (v[10:])
thedatawriter.writerow([s,t])
import re
import calendar
fullname = 'Non IVR Entries Transactions December_16_2016_07_49_22 PM.txt'
months = list(calendar.month_name[1:])
regex = re.compile('|'.join(months))
iter = re.finditer(regex, fullname)
if iter:
idx = [it for it in iter][0].start()
filename, timestamp = fullname[:idx],fullname[idx:-4]
print filename, timestamp
else:
print "Month not found"
Assuming that you want the filename and timestamp as splits and the month occurs only once in the string, I hope the following code solves your problem.
I'm trying to pull two of the same files into python in different dataframes, with the end goal of comparing what was added in the new file and removed from the old. So far, I've got code that looks like this:
In[1] path = r'\\Documents\FileList'
files = os.listdir(path)
In[2] files_txt = [f for f in files if f[-3:] == 'txt']
In[3] for f in files_txt:
data = pd.read_excel(path + r'\\' + f)
df = df.append(data)
I've also set a variable to equal the current date minus a certain number of days, which I want to use to pull the file that has a date equal to that variable:
d7 = dt.datetime.today() - timedelta(7)
As of now, I'm unsure of how to do this, as the first part of the filename always remains the same but they add numbers at the end (eg. file_03232016 then file_03302016). I want to parse through the directory for the beginning part of the filename and add it to a dataframe if it matches the date parameter I set.
EDIT: I forgot to add that sometimes I also need to look at the system date created timestamp, as the text date in the file name isn't always there.
Here are some modifications to your original code to get a list of files containing your target date. You need to use strftime.
import os
from datetime import timedelta
d7 = dt.datetime.today() - timedelta(7)
target_date_str = d7.strftime('_%m%d%Y')
files_txt = [f for f in files if f[-13:] == target_date_str + '.txt']
>>> target_date_str + '.txt'
'_03232016.txt'
data = []
for f in files_txt:
data.append(pd.read_excel(os.path.join(path, f))
df = pd.concat(data, ignore_index=True)
Use strftime in order to represent your datetime variable as a string with desired format and glob for searching files by file mask in the directory:
import datetime as dt
import glob
fmask = r'\\Documents\FileList\*' + (dt.datetime.today() - dt.timedelta(7)).strftime('%m%d%Y') + '*.txt'
files_txt = glob.glob(fmask)
# concatenate all CSV/txt files into one data frame
df = pd.concat([pd.read_csv(f) for f in files_txt], ignore_index=True)
PS I guess you want to use read_csv instead of read_excel when working with txt files unless you really have excel files with txt extension?
I have a files like this.
1.stream0106.wav
2.stream0205.wav
3.steram0304.wav
I need to rename "01" in a file name as "_C" & "06" as "_LFE1" Like this.This new names I have in csv file like below.
Can you please guuide me for this.
I'm not sure if you want the "01" to be replaced or appended. The csv titles make it confusing.
I would first make the csv file start in column A and row 1 to make reading it in easier for you.
If you are appending names this should work
import os
import csv
# Assuming files are just in current directory
wav_files = [f for f in os.listdir('.') if f.endswith('.wav')]
with open('your_file.csv', 'rb') as csv_file:
mappings = [row.strip().split(',' ) for row in csv_file.readlines()[1:]]
for f in wav_files:
for digit, name in mappings:
if f[:-4].endswith(digit):
new_name = f.replace(digit,name)
os.rename(f, new_name)
break
EDIT
Old Name,New Name
00,_0
01,_C
02,_L
03,_R
04,_Ls
05,_Rs
06,_LFE1
07,_Cs
This can be achieved by just having them in excel starting at Col A and Row 1