Simple way to refactor this Python code to reduce repetition - python-2.7

I'd like help refactoring this code to reduce redundant lines/concepts. The code for this def in basically repeated 3 times.
Restrictions:
- I'm new, so a really fancy list comprehension or turning things into objects with dunders and method overrides is way to advanced for me.
- Built in modules only. This is Pyhton 2.7 code, and only imports os and re.
What the overall script does:
Finds files with a fixed prefix. The files are pipe-delimited text files. The first row is a header. It has a footer which can be 1 or more rows. Based on the prefix, the script throws away "columns" from the text file that aren't needed in another step. It saves the data, comma-separated, in a new file with a .csv extension.
The bulk of the work is done in processRawFiles(). This is what I'd like refactored, since it's wildly repetitive.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecodeheader+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
for key in groupedFileDict:
# Process the rateCodes file
if key == 'rateCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('RATE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process the roomTypes file
if key == 'roomTypeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('LABEL')
ratecodemeaning = parsedheaders.index('SHORT_DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process sourceCodes file
if key == 'sourceCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('SOURCE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
processRawFiles(groupedFilestoProcess)

Had to redo my code because there was a new incident where the files in question neither had the header row, nor the footer row. However, since the columns I want still occur in the same order I can keep them only. Also, we stop reading if any next row has fewer columns than the larger of the two indices used.
As for reducing repetition, processRawFiles contains two def's that remove the need to repeat a lot of that parsing code from before.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecode+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
def someFixedProcess(bFileList, codeIndex, codeDescriptionIndex):
for fname_Value in bFileList: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
# print '\n','outname: ',outname,'\n\n'
# HEADERS ARE NOT IGNORED! Since the file might not have headers.
print outname
for dataline in filedatalines:
# print filedatalines.index(dataline), dataline.split('|')
# e.g. index 13, reuires len 14, so len > index is needed
if len(dataline.split('|')) > codeDescriptionIndex:
thecode_text = dataline.split('|')[codeIndex]
thedescription_text = dataline.split('|')[codeDescriptionIndex]
linetowrite = thecode_text + ',' + thedescription_text + '\n'
outputfile.write(linetowrite)
outputfile.close()
def processByType(aFileList, itsType):
typeDict = {'rateCodeFiles' : {'CODE_INDEX': 4,'DESC_INDEX':7},
'roomTypeFiles' : {'CODE_INDEX': 1,'DESC_INDEX':13},
'sourceCodeFiles': {'CODE_INDEX': 2,'DESC_INDEX':3}}
# print 'someFixedProcess(',aFileList,typeDict[itsType]['CODE_INDEX'],typeDict[itsType]['DESC_INDEX'],')'
someFixedProcess(aFileList,
typeDict[itsType]['CODE_INDEX'],
typeDict[itsType]['DESC_INDEX'])
for key in groupedFileDict:
processByType(groupedFileDict[key],key)
processRawFiles(groupedFilestoProcess)

Related

rstrip, split and sort a list from input text file

I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y

python script returning list with double bracket

I have the following code in python 3. I'm trying to read a text file and output a list of numerical values. These values will then be used when searching through a number of pdf invoices.
Here is what I have for the text file portion:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
txt_numbers.append(num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)
Here is what is the output:
[[], ['51,500.00'], ['6,000.00'], ['77,000.00'], ['37,000.00']]
Question: How do I remove the "[" from within the list. I would like to have just ['51,500.00', '6,000.00', etc...]
I tried doing new_text_numbers = (", ".join(txt_numbers)) and then print(new_text_numbers)
Problem: I was appending a list with a list, which is allowed in python just not what I wanted.
Added lines:
new_num = (", ".join(num))
txt_numbers.append(new_num)
Solution:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
new_num = (", ".join(num))
txt_numbers.append(new_num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)

What is the best way for sum numbers at a big text file?

What is the best way for sum numbers at a big text file?
The text file will contain numbers separated by a comma (',').
The number can be from any type.
No line or row limits.
for example:
1 ,-2, -3.45-7.8j ,99.6,......
...
...
Input: path to the text file
Output: the sum of the numbers
I am tried to wrote one solution at myself and want to know for better solutions:
This is my try:
I am working with chunks of data and not read line by line, and because the end of the chunk can contain some of the number (just -2 and not -2+3j) i am looking just on the "safe piece" the last comma (',') and the other part save for the next chunk
import re
CHUNK_SIZE = 1017
def calculate_sum(file_path):
_sum = 0
with open(file_path, 'r') as _f:
chunk = _f.read(CHUNK_SIZE)
while chunk:
chunk = chunk.replace(' ', '')
safe_piece = chunk.rfind(',')
next_chunk = chunk[safe_piece:] if safe_piece != 0 else ''
if safe_piece != 0:
chunk = chunk[:safe_piece]
_sum += sum(map(complex, re.findall(r"[+-]\d*\.?\d*[+-]?\d*\.?\d*j|[+-]?\d+(?:\.\d+)?", chunk)))
chunk = next_chunk + _f.read(CHUNK_SIZE)
return _sum
Thanks!
This will add up any amount of numbers in a text file. Example:
input.csv
1,-2,-3.45-7.8j,99.6
-1,1-2j
1.5,2.5,1+1j
example.py
import csv
with open('input.txt','rb') as f:
r = csv.reader(f)
total = 0
for line in r:
total += sum(complex(col) for col in line)
print total
Output
(100.15-8.8j)
If you have really long lines and insufficient memory to read it in one go, then you could use a buffering class to chunk the reads and split numbers out of the buffer using a generator function:
import re
class Buffer:
def __init__(self,filename,chunksize=4096):
self.filename = filename
self.chunksize = chunksize
self.buf = ''
def __iter__(self):
with open(self.filename) as f:
while True:
if ',' in self.buf or '\n' in self.buf:
data,self.buf = re.split(r',|\n',self.buf,1) # split off the text up to the first separator
yield complex(data)
else:
chunk = f.read(self.chunksize)
if not chunk: # if no more data to read, return the remaining buffer and exit function
if self.buf:
yield complex(self.buf)
return
self.buf += chunk
total = 0
for num in Buffer('input.txt'):
total += num
print total
Output:
(100.15-8.8j)

Read file as list, edit and write back

Say I have a textfile containing the following:
1:Programming:Adam:0
2:Math:Max:0
3:Engineering:James:0
I am trying to read this textfile as a list, then have a user specify which 0 of a line they want to change to 1, then rewrite the changes made back into textfile.
So for example if a user specifies line 2, I want the 0 in line 2 to be changed to 1 and then save the changes made back onto the textfile.
So far I have the following and I just can't get it to over write it:
class Book_list:
def __init__(self,book_ID,book_title,book_author,availability):
self.book_ID = book_ID
self.book_title = book_title
self.book_author = book_author
self.availability = availability
def __str__(self):
return ('ID: ' + self.book_ID + '\nBook_Title: ' + self.book_title +
'\nBook_author: ' + self.book_author +
'\navailability: ' + self.availability + '\n')
def __getitem__(self,book_ID):
return self.book_ID
def __getitem__(self,availability):
return self.availability
x=str(raw_input('enter line number.'))
with open('database.txt','r') as f:
lines = f.readlines()
library = []
for line in lines:
line = line.strip()
data = line.split(':')
b = Book_list(data[0],data[1],data[2],str(data[3]))
library.append(b)
for i in range (0,len(library)):
if (library[i])[0]==x and (library[i])[3]==0:
(library[i])[3]== '1'
with open('database.txt', 'w') as f:
f.writelines( library )
you can read file and store it in a string. then using split make a list from file:
str='a:b:c'
lst=str.split(':') #lst=['a','b','c']
edit as you like and then join them with .join:
str2=':'.join(lst) #str2='a:b:c'

Python CSV export writing characters to new lines

I have been using multiple code snippets to create a solution that will allow me to write a list of players in a football team to a csv file.
import csv
data = []
string = input("Team Name: ")
fName = string.replace(' ', '') + ".csv"
print("When you have entered all the players, press enter.")
# while loop that will continue allowing entering of players
done = False
while not done:
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ")
print (string)
file = open(fName, 'w')
output = csv.writer(file)
for row in string:
tempRow = row
output.writerow(tempRow)
file.close()
print("Team written to file.")
I would like the exported csv file to look like this:
player1,25,striker
player2,27,midfielder
and so on. However, when I check the exported csv file it looks more like this:
p
l
a
y
e
r
,
2
5
and so on.
Does anyone have an idea of where i'm going wrong?
Many thanks
Karl
Your string is a single string. It is not a list of strings. You are expecting it to be a list of strings when you are doing this:
for row in string:
When you iterate over a string, you are iterating over its characters. Which is why you are seeing a character per line.
Declare a list of strings. And append every string to it like this:
done = False
strings_list = []
while not done:
string = ""
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ") + '\n'
strings_list.append(string)
Now iterate over this strings_list and print to the output file. Since you are putting the delimiter (comma) yourself in the string, you do not need a csv writer.
a_file = open(fName, 'w')
for row in strings_list:
print(row)
a_file.write(row)
a_file.close()
Note:
string is a name of a standard module in Python. It is wise not to use this as a name of any variable in your program. Same goes for your variable file