Read file as list, edit and write back

Read file as list, edit and write back - list

Say I have a textfile containing the following:
1:Programming:Adam:0
2:Math:Max:0
3:Engineering:James:0
I am trying to read this textfile as a list, then have a user specify which 0 of a line they want to change to 1, then rewrite the changes made back into textfile.
So for example if a user specifies line 2, I want the 0 in line 2 to be changed to 1 and then save the changes made back onto the textfile.
So far I have the following and I just can't get it to over write it:
class Book_list:
def __init__(self,book_ID,book_title,book_author,availability):
self.book_ID = book_ID
self.book_title = book_title
self.book_author = book_author
self.availability = availability
def __str__(self):
return ('ID: ' + self.book_ID + '\nBook_Title: ' + self.book_title +
'\nBook_author: ' + self.book_author +
'\navailability: ' + self.availability + '\n')
def __getitem__(self,book_ID):
return self.book_ID
def __getitem__(self,availability):
return self.availability
x=str(raw_input('enter line number.'))
with open('database.txt','r') as f:
lines = f.readlines()
library = []
for line in lines:
line = line.strip()
data = line.split(':')
b = Book_list(data[0],data[1],data[2],str(data[3]))
library.append(b)
for i in range (0,len(library)):
if (library[i])[0]==x and (library[i])[3]==0:
(library[i])[3]== '1'
with open('database.txt', 'w') as f:
f.writelines( library )

you can read file and store it in a string. then using split make a list from file:
str='a:b:c'
lst=str.split(':') #lst=['a','b','c']
edit as you like and then join them with .join:
str2=':'.join(lst) #str2='a:b:c'

Related

rstrip, split and sort a list from input text file

I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y

I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)

If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y

Reading mailing addresses of varying length from a text file using regular expressions

I am trying to read a text file and collect addresses from it. Here's an example of one of the entries in the text file:
Electrical Vendor Contact: John Smith Phone #: 123-456-7890
Address: 1234 ADDRESS ROAD Ship To:
Suite 123 ,
Nowhere, CA United States 12345
Phone: 234-567-8901 E-Mail: john.smith#gmail.com
Fax: 345-678-9012 Web Address: www.electricalvendor.com
Acct. No: 123456 Monthly Due Date: Days Until Due
Tax ID: Fed 1099 Exempt Discount On Assets Only
G/L Liab. Override:
G/L Default Exp:
Comments:
APPROVED FOR ELECTRICAL THINGS
I cannot wrap my head around how to search for and store the address for each of these entries when the amount of lines in the address varies. Currently, I have a generator that reads each line of the file. Then the get_addrs() method attempts to capture markers such as the Address: and Ship keywords in the file to signify when an address needs to be stored. Then I use a regular expression to search for zip codes in the line following a line with the Address: keyword. I think I've figured out how successfully save the second line for all addresses using that method. However, in a few addresses,es there is a suite number or other piece of information that causes the address to become three lines instead of two. I'm not sure how to account for this and I tried expanding my save_previous() method to three lines, but I can't get it quite right. Here's the code that I was able to successfully save all of the two line addresses with:
import re
class GetAddress():
def __init__(self):
self.line1 = []
self.line2 = []
self.s_line1 = []
self.addr_index = 0
self.ship_index = 0
self.no_ship = False
self.addr_here = False
self.prev_line = []
self.us_zip = ''
# Check if there is a shipping address.
def set_no_ship(self, line):
try:
self.no_ship = line.index(',') == len(line) - 1
except ValueError:
pass
# Save two lines at a time to see whether or not the previous
# line contains 'Address:' and 'Ship'.
def save_previous(self, line):
self.prev_line += [line]
if len(self.prev_line) > 2:
del self.prev_line[0]
def get_addrs(self, line):
self.addr_here = 'Address:' in line and 'Ship' in line
self.po_box = False
self.no_ship = False
self.addr_index = 0
self.ship_index = 0
self.zip1_index = 0
self.set_no_ship(line)
self.save_previous(line)
# Check if 'Address:' and 'Ship' are in the previous line.
self.prev_addr = (
'Address:' in self.prev_line[0]
and 'Ship' in self.prev_line[0])
if self.addr_here:
self.po_box = 'Box' in line or 'BOX' in line
self.addr_index = line.index('Address:') + 1
self.ship_index = line.index('Ship')
# Get the contents of the line between 'Address:' and
# 'Ship' if both words are present in this line.
if self.addr_index is not self.ship_index:
self.line1 += [' '.join(line[self.addr_index:self.ship_index])]
elif self.addr_index is self.ship_index:
self.line1 += ['']
if len(self.prev_line) > 1 and self.prev_addr:
self.po_box = 'Box' in line or 'BOX' in line
self.us_zip = re.search(r'(\d{5}(\-\d{4})?)', ' '.join(line))
if self.us_zip and not self.po_box:
self.zip1_index = line.index(self.us_zip.group(1))
if self.no_ship:
self.line2 += [' '.join(line[:line.index(',')])]
elif self.zip1_index and not self.no_ship:
self.line2 += [' '.join(line[:self.zip1_index + 1])]
elif len(self.line1) > 0 and not self.line1[-1]:
self.line2 += ['']
# Create a generator to read each line of the file.
def read_gen(infile):
with open(infile, 'r') as file:
for line in file:
yield line.split()
infile = 'Vendor List.txt'
info = GetAddress()
for i, line in enumerate(read_gen(infile)):
info.get_addrs(line)
I am still a beginner in Python so I'm sure a lot of my code may be redundant or unnecessary. I'd love some feedback as to how I might make this simpler and shorter while capturing both two and three line addresses.

I also posted this question to Reddit and u/Binary101010 pointed out that the text file is a fixed width, and it may be possible to slice each line in a way that only selects the necessary address information. Using this intuition I added some functionality to the generator expression, and I was able to produce the desired effect with the following code:
infile = 'Vendor List.txt'
# Create a generator with differing modes to read the specified lines of the file.
def read_gen(infile, mode=0, start=0, end=0, rows=[]):
lines = list()
with open(infile, 'r') as file:
for i, line in enumerate(file):
# Set end to correct value if no argument is given.
if end == 0:
end = len(line)
# Mode 0 gives all lines of the file
if mode == 0:
yield line[start:end]
# Mode 1 gives specific lines from the file using the rows keyword
# argument. Make sure rows is formatted as [start_row, end_row].
# rows list should only ever be length 2.
elif mode == 1:
if rows:
# Create a list for indices between specified rows.
for element in range(rows[0], rows[1]):
lines += [element]
# Return the current line if the index falls between the
# specified rows.
if i in lines:
yield line[start:end]
class GetAddress:
def __init__(self):
# Allow access to infile for use in set_addresses().
global infile
self.address_indices = list()
self.phone_indices = list()
self.addresses = list()
self.count = 0
def get(self, i, line):
# Search for appropriate substrings and set indices accordingly.
if 'Address:' in line[18:26]:
self.address_indices += [i]
if 'Phone:' in line[18:24]:
self.phone_indices += [i]
# Add address to list if both necessary indices have been collected.
if i in self.phone_indices:
self.set_addresses()
def set_addresses(self):
self.address = list()
start = self.address_indices[self.count]
end = self.phone_indices[self.count]
# Create a generator that only yields substrings for rows between given
# indices.
self.generator = read_gen(
infile,
mode=1,
start=40,
end=91,
rows=[start, end])
# Collect each line of the address from the generator and remove
# unnecessary spaces.
for element in range(start, end):
self.address += [next(self.generator).strip()]
# This document has a header on each page and a portion of that is
# collected in the address substring. Search for the header substring
# and remove the corresponding elements from self.address.
if len(self.address) > 3 and not self.address[-1]:
self.address = self.address[:self.address.index('header text')]
self.addresses += [self.address]
self.count += 1
info = GetAddress()
for i, line in enumerate(read_gen(infile)):
info.get(i, line)

Text file value replace in python

I am trying to replace text value as below. I have 2 text file
1 - input.txt
abc = 123
xyz = 456
pqr = 789
2 - content.txt
AAA = abc
XXX = xyz
PPP = pqr
now I need to read the input.txt file and replace value on content.txt file with input.txt values and get the below output file.
3 - new.txt
AAA = 123
XXX = 456
PPP = 789
How can I do this ?

First read the contents of the file into 2 arrays in the following way
file1handle = open('filename1', 'r')
file1 = file1handle.readlines()
file2handle = open('filename2', 'r')
file2 = file2handle.readlines()
file2handle.close()
file2handle.close()
Then iterate over the contents and try finding the match with variable names and assignments and put the values into third array in following way
for item in file1:
name, value = item.split(' = ')
for item2 in file2:
name2, assignment = item2.split(' = ')
#Here we are trying to check which name is to be assigned which value
if assignment == name:
val = name2+'='+value
file3.append(val)
Then write the contents into file in following way
filehandle3 = open('filename3', 'w')
for line in file3:
filehandle3.write(line)
filehandle3.close()

This may help you,
_input = {}
with open('input.txt', 'r') as f:
s = f.read()
_input = dict((a.split(' = ')[0], int(a.split(' = ')[1])) for a in s.split('\n'))
_content = {}
with open('content.txt', 'r') as f:
s = f.read()
_content = dict((a.split(' = ')[0], a.split(' = ')[1]) for a in s.split('\n'))
for key in _content:
_content[key] = _input[_content[key]]
Result:
In [18]: _content
Out[19]: {'AAA': 123, 'PPP': 789, 'XXX': 456}

How about using pandas: It's shorter, easier to read and faster when using large files.
import pandas as pd
import numpy as np
input=pd.read_csv("input.txt",sep="=",header=None,usecols=[1])
content=pd.read_csv("content.txt",sep="=",header=None,usecols=[0])
foo=np.hstack(([content.values,input.values]))
new=pd.DataFrame(foo)
new.to_csv("new.txt",index=False,sep="=",header=None)

import re
class Defs:
def __init__(self, defs_file):
self._defs = {}
with open(defs_file) as df:
line_num = 0
for l in df:
line_num += 1
m = re.match(r'\s*(\w+)\s*=\s*(\S+)\s*', l)
assert m, \
"invalid assignment syntax with \"{}\" at line {}".format(
l.rstrip(), line_num)
self._defs[m.group(1)] = m.group(2)
def __getitem__(self, var):
return self._defs[var]
#property
def dict(self):
return self._defs
class Replacer:
def __init__(self, defs):
self._defs = defs
def replace_with_defs(self, context_file, output_file):
with open(context_file) as context, open(output_file, 'w') as output:
for line in context:
string_repl = re.sub(r'\b(\w+)\b',
lambda m: self._defs.dict.get(m.group(1)) or m.group(1), line)
output.write(string_repl)
def main():
defs = Defs('input.txt')
repl = Replacer(defs)
repl.replace_with_defs('context.txt', 'output.txt')
if __name__ == '__main__':
main()
To describe what's going on above, the Defs class takes a defs_file which is the input.txt assignments and stores them in a dict binding each variable name to the associated value. The Replacer class handles takes a Defs object and uses those to iterate over each line in the context_file i.e. context.txt and replaces any token (assuming the token is a variable name) with the value associated with it, specified within the Defs object, and writes this out to a file output_file i.e. output.txt. If the token doesn't exist in the Defs object as a valid variable name, it defaults to the write the token as is.

Simple way to refactor this Python code to reduce repetition

I'd like help refactoring this code to reduce redundant lines/concepts. The code for this def in basically repeated 3 times.
Restrictions:
- I'm new, so a really fancy list comprehension or turning things into objects with dunders and method overrides is way to advanced for me.
- Built in modules only. This is Pyhton 2.7 code, and only imports os and re.
What the overall script does:
Finds files with a fixed prefix. The files are pipe-delimited text files. The first row is a header. It has a footer which can be 1 or more rows. Based on the prefix, the script throws away "columns" from the text file that aren't needed in another step. It saves the data, comma-separated, in a new file with a .csv extension.
The bulk of the work is done in processRawFiles(). This is what I'd like refactored, since it's wildly repetitive.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecodeheader+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
for key in groupedFileDict:
# Process the rateCodes file
if key == 'rateCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('RATE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process the roomTypes file
if key == 'roomTypeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('LABEL')
ratecodemeaning = parsedheaders.index('SHORT_DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process sourceCodes file
if key == 'sourceCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('SOURCE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
processRawFiles(groupedFilestoProcess)

Had to redo my code because there was a new incident where the files in question neither had the header row, nor the footer row. However, since the columns I want still occur in the same order I can keep them only. Also, we stop reading if any next row has fewer columns than the larger of the two indices used.
As for reducing repetition, processRawFiles contains two def's that remove the need to repeat a lot of that parsing code from before.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecode+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
def someFixedProcess(bFileList, codeIndex, codeDescriptionIndex):
for fname_Value in bFileList: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
# print '\n','outname: ',outname,'\n\n'
# HEADERS ARE NOT IGNORED! Since the file might not have headers.
print outname
for dataline in filedatalines:
# print filedatalines.index(dataline), dataline.split('|')
# e.g. index 13, reuires len 14, so len > index is needed
if len(dataline.split('|')) > codeDescriptionIndex:
thecode_text = dataline.split('|')[codeIndex]
thedescription_text = dataline.split('|')[codeDescriptionIndex]
linetowrite = thecode_text + ',' + thedescription_text + '\n'
outputfile.write(linetowrite)
outputfile.close()
def processByType(aFileList, itsType):
typeDict = {'rateCodeFiles' : {'CODE_INDEX': 4,'DESC_INDEX':7},
'roomTypeFiles' : {'CODE_INDEX': 1,'DESC_INDEX':13},
'sourceCodeFiles': {'CODE_INDEX': 2,'DESC_INDEX':3}}
# print 'someFixedProcess(',aFileList,typeDict[itsType]['CODE_INDEX'],typeDict[itsType]['DESC_INDEX'],')'
someFixedProcess(aFileList,
typeDict[itsType]['CODE_INDEX'],
typeDict[itsType]['DESC_INDEX'])
for key in groupedFileDict:
processByType(groupedFileDict[key],key)
processRawFiles(groupedFilestoProcess)

How do I access text files from my computer?

My current code is as follows but I cannot figure out how to access a text file (e.g "john.txt")
def read_script():
while True:
try:
filename = input('Please Enter Text Name: ')
F = open (filename, 'r')
script - F.read()
F.close()
slist = script.split()
return slist
except OSError:
print ('Oops! That file does not exist! Try spelling it correctly: ')
def pig_english():
letterlist = [i + i[0] for i in read_script()]
ayList = [i + 'ay' for i in letterlist]
delaylist = [i[1:] for i in aylist]
print (delaylist)
read_script()
pig_english()

you want raw_input() and not input(). input() interprets the user input as an object and not a string.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Read file as list, edit and write back - list

you can read file and store it in a string. then using split make a list from file: str='a:b:c' lst=str.split(':') #lst=['a','b','c'] edit as you like and then join them with .join: str2=':'.join(lst) #str2='a:b:c'

Related

rstrip, split and sort a list from input text file

Reading mailing addresses of varying length from a text file using regular expressions

Text file value replace in python

Simple way to refactor this Python code to reduce repetition

How do I access text files from my computer?

Categories

Resources