I have a list of terms in a file that I want to read, modify each term and output the new terms to a new file. The new terms should look like this: take the first two characters of the original term put them in quotes, add a '=>' then the original term in quotes and a comma.
This is the code I'm using:
def newFile(newItem):
original = line
first = line[0:2]
newItem = first+'=>'+original+','
return newItem
input = open('/Users/george/Desktop/input.txt', 'r')
output = open('/Users/george/Desktop/output.txt', 'w')
collector = ''
for line in input:
if len(line) != 0:
collector = newFile(input)
output.write(''.join(collector))
if len(line) == 0:
input.close()
output.close()
For example:
If the terms in the input.txt file are these:
term 1
term 2
term 3
term 4
The output is this:
te=>term 1
,te=>term 2
,te=>term 3
,te=>term 4
,
How can I add '' to the first two letters and to the term? And why the second, third and forth terms have ,te not te like it should?
Instead of using collector and newFile() you can use new variable:
modified_line = "'%s'=>'%s'," % (line[:2], line.strip())
and in your loop try this:
...
if len(line) > 2:
output.write('%s\n' % (modified_line))
Also:
if possible do not hard code file names in your program, use sys.argv, standard input/output or config file; of course if you are sure of input/output names then use them
in line[0:2] you can ommit 0 and use line[:2]
you should use try: - open file - read file etc. finally: close file
you don't need to check if len(line) == 0, for loop do it already and you will receive line with CRLF for empty lines, but end of input file is when for loop ends
Related
The file look like a series of lines with IDs:
aaaa
aass
asdd
adfg
aaaa
I'd like to get in a new file the ID and its occurrence in the old file as the form:
aaaa 2
asdd 1
aass 1
adfg 1
With the 2 element separated by tab.
The code i have print what i want but doesn't write in a new file:
with open("Only1ID.txt", "r") as file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print item.title(), file.count(item)
As you use Python 2, the simplest approach to convert your console output to file output is by using the print chevron (>>) syntax which redirects the output to any file-like object:
with open("filename", "w") as f: # open a file in write mode
print >> f, "some data" # print 'into the file'
Your code could look like this after simply adding another open to open the output file and adding the chevron to your print statement:
with open("Only1ID.txt", "r") as file, open("output.txt", "w") as out_file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print >> out_file item.title(), file.count(item)
However, your code has a few other more or less bad things which one should not do or could improve:
Do not use the same variable name file for both the file object returned by open and your processed list of strings. This is confusing, just use two different names.
You can directly iterate over the file object, which works like a generator that returns the file's lines as strings. Generators process requests for the next element just in time, that means it does not first load the whole file into your memory like file.readlines() and processes them afterwards, but only reads and stores one line at a time, whenever the next line is needed. That way you improve the code's performance and resource efficiency.
If you write a list comprehension, but you don't need its result necessarily as list because you simply want to iterate over it using a for loop, it's more efficient to use a generator expression (same effect as the file object's line generator described above). The only syntactical difference between a list comprehension and a generator expression are the brackets. Replace [...] with (...) and you have a generator. The only downside of a generator is that you neither can find out its length, nor can you access items directly using an index. As you don't need any of these features, the generator is fine here.
There is a simpler way to remove trailing newline characters from a line: line.rstrip() removes all trailing whitespaces. If you want to keep e.g. spaces, but only want the newline to be removed, pass that character as argument: line.rstrip("\n").
However, it could possibly be even easier and faster to just not add another implicit line break during the print call instead of removing it first to have it re-added later. You would suppress the line break of print in Python 2 by simply adding a comma at the end of the statement:
print >> out_file item.title(), file.count(item),
There is a type Counter to count occurrences of elements in a collection, which is faster and easier than writing it yourself, because you don't need the additional count() call for every element. The Counter behaves mostly like a dictionary with your items as keys and their count as values. Simply import it from the collections module and use it like this:
from collections import Counter
c = Counter(lines)
for item in c:
print item, c[item]
With all those suggestions (except the one not to remove the line breaks) applied and the variables renamed to something more clear, the optimized code looks like this:
from collections import Counter
with open("Only1ID.txt") as in_file, open("output.txt", "w") as out_file:
counter = Counter(line.lower().rstrip("\n") for line in in_file)
for item in sorted(counter):
print >> out_file item.title(), counter[item]
I want to insert my Random()'s return value into txt file without overwrite ('a') and to a specific location, like at the sixt character, but when I execute this, Random is insert to the third line.
`def Modif_Files(p_folder_path):
Tab = []
for v_root, v_dir, v_files in os.walk(p_folder_path):
print v_files
for v_file in v_files:
file = os.path.join(p_folder_path, v_file)
#with open(file, 'r') as files:
#for lines in files.readlines():
#Tab.append([lines])
with open(file, 'a') as file:
file.write("\n add " + str(Random())) #Random = int
#file.close
def Random():
global last
last = last + 3 + last * last * last * last % 256
return last
def main ():
Modif_Files(Modif_Path, 5) # Put path with a txt file inside
if __name__ == '__main__':
main()
`
After going through few other posts, it seems it is not possible to write in the middle of beginning of a file directly without overwriting. To write in the middle you need to copy or read everything after the position where you want to insert. Then after inserting append the content you read to the file.
Source: How do I modify a text file in Python?
Okay, I found the solution ; with open(file, 'r+') as file:
r+ and it work like a charm :)
The given answer is incorrect and/or lacking significant detail. At the time of this question maybe it wasn't, but currently writing to specific positions within a file using Python IS possible. I came across this question and answer in my search for this exact issue - an update could be useful for others.
See below for a resolution.
def main():
file = open("test.txt", "rb")
filePos = 0
while True:
# Read the file character by character
char = file.read(1)
# When we find the char we want, break the loop and save the read/write head position.
# Since we're in binary, we need to decode to get it to proper format for comparison (or encode the char)
if char.decode('ascii') == "*":
filePos = file.tell()
break
# If no more characters, we're at the end of the file. Break the loop and end the program.
elif not char:
break
# Resolve open/unneeded file pointers.
file.close()
# Open the file in rb+ mode for writing without overwriting (appending).
fileWrite = open("test.txt", 'rb+')
# Move the read/write head to the location we found our char at.
fileWrite.seek(filePos - 1)
# Overwrite our char.
fileWrite.write(bytes("?", "ascii"))
# Close the file
fileWrite.close()
if __name__ == "__main__":
main()
I am trying to run the following code:
fname = raw_input ('Enter file name:')
fh = open (fname)
count = 0
for line in fh:
if not line.startswith ('X-DSPAM-Confidence:') : continue
else:
count = count + 1
new = fh #this new = fh is supposed to be fh stripped of the non- x-dspam lines
for line in new: # this seperates the lines in new and allows `finding the floats on each line`
numpos = new.find ('0')
endpos = new.find ('5', numpos)
num = new[numpos:endpos + 1]
float (num)
# should now have a list of floats
print num
The intention of this code is to prompt the user for a file name, open the file, read through the file, compile all the lines that start with X-DSPAM, and extract the float number on these lines. I am fairly new to coding so I realise I may have committed a number of errors, but currently when I try to run it, after putting in the file name I get the return:
I looked around and I have seen that mode 'r' refers to different file modes in python in relation to how the end of the line is handled. However the code I am trying to run is similar to other code I have formulated and it does not have any non-text files inside, the file being opened is a .txt file. Is it something to do with converting a list of strings line by line to a list of float numbers?
Any ideas on what I am doing wrong would be appreciated.
The default mode of handling a file is 'r' - which means 'read', which is what you want. It means the program is going to read the file (as opposed to 'w' - write, or 'a' - append, for example - which would allow you to overwrite the file or append to it, which you don't want in this case).
There are some bugs in your code, which I've tried to indicate in the edited code below.
You don't need to assign new = fh - you're not grabbing lines and passing them to a new file. Rather, you're checking each line against the 'XDSPAM' criteria and if it's a match, you can proceed to parse out the desired numbers. If not, you ignore it and go to the next line.
With that in mind, you can move all of the code from the for line in new to be part of the original if not ... else block.
How you find the end of the number is also a bit off. You set endpos by searching for an occurence of the number 5 - but what I think you want is to find a position 5 characters from the start position (numpos + 5).
(There are other ways to parse the line and pull the number, but I'm going to stick with your logic as indicated by your code, so nothing fancy here.)
You can convert to float in the same statement where you slice the number from the line (as below). It's acceptable to do:
num = line[numpos:endpos+1]
float_num = float(num)
but not necessary. In any event, you want to assign the conversion (float(num)) to a variable - just having float(num) doesn't allow you to pass the converted value to another statement (including print).
You say that you should have 'a list of floats' - the code as corrected below - will give you a display of all the floats, but if you want an actual Python list, there are other steps involved. I don't think you wanted a Python list, but just in case:
numlist = [] # at the beginning, declare a new, empty list
...
# after converting to float, append number to list
XDSPAM.append(num)
print XDSPAMs # at end of program, to print full list
In any event, this edited code works for me with an appropriate file of test data, and outputs the desired float numbers:
fname = raw_input ('Enter file name:')
fh = open (fname)
count = 0
for line in fh:
if not line.startswith ('X-DSPAM-Confidence:') : continue
else:
# there's no need to create the 'new' variable
# any lines that meet the criteria can be processed for numbers
count = count + 1
numpos = line.find ('0')
# i think what you want here is to set an endpoint 5 positions to the right
# but your code was looking for the position of a '5' in the line
endpos = numpos + 5
# you can convert to float and slice in the same statement
num = float(line[numpos:endpos+1])
print num
So I am trying to separate a text file using line.split() and a for loop, but I am getting an index error. I have read around and understand why this error is given but I don't understand how index[0] and index[1] could be out of range (I bolded the line that is returning the error):
names = {}
file = open('sourcefile.txt', 'r')
text += file.readlines()
file.close()
for line in text:
tmp = line.split()
**names[tmp[1]] = tmp[0]**
sourcefile.txt looks like this:
1 (data and numbers)
2 (data and numbers)
3 (data and numbers)
4 (data and numbers)
If anyone can help I would appreciate it a lot.
Edit forgot to mention I am using python in the title
I suspect that there's a problem reading the file and file.readLines() is returning an empty set, making text an empty string. It would follow that the split() results in an array of length 0, making even index=0 out or range.
I'm working on a simple Python game where the computer tries to guess a number you think of. Every time it guesses the right answer, it saves the answer to a txt file. When the program is run again, it will guess the old answers first (if they're in the range the user specifies).
try:
f = open("OldGuesses.txt", "a")
r = open("OldGuesses.txt", "r")
except IOError as e:
f = open("OldGuesses.txt", "w")
r = open("OldGuesses.txt", "r")
data = r.read()
number5 = random.choice(data)
print number5
When I run that to pull the old answers, it grabs one item. Like say I have the numbers 200, 1242, and 1343, along with spaces to tell them apart, it will either pick a space, or a single digit. Any idea how to grab the full number (like 200) and/ or avoid picking spaces?
The r.read() call reads the entire contents of r and returns it as a single string. What you can do is use a list comprehension in combination with r.readlines(), like this:
data = [int(x) for x in r.readlines()]
which breaks up the file into lines and converts each line to an integer.