Hello I have a code that prints what I need in python but i'd like it to write that result to a new file - python-2.7

The file look like a series of lines with IDs:
aaaa
aass
asdd
adfg
aaaa
I'd like to get in a new file the ID and its occurrence in the old file as the form:
aaaa 2
asdd 1
aass 1
adfg 1
With the 2 element separated by tab.
The code i have print what i want but doesn't write in a new file:
with open("Only1ID.txt", "r") as file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print item.title(), file.count(item)

As you use Python 2, the simplest approach to convert your console output to file output is by using the print chevron (>>) syntax which redirects the output to any file-like object:
with open("filename", "w") as f: # open a file in write mode
print >> f, "some data" # print 'into the file'
Your code could look like this after simply adding another open to open the output file and adding the chevron to your print statement:
with open("Only1ID.txt", "r") as file, open("output.txt", "w") as out_file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print >> out_file item.title(), file.count(item)
However, your code has a few other more or less bad things which one should not do or could improve:
Do not use the same variable name file for both the file object returned by open and your processed list of strings. This is confusing, just use two different names.
You can directly iterate over the file object, which works like a generator that returns the file's lines as strings. Generators process requests for the next element just in time, that means it does not first load the whole file into your memory like file.readlines() and processes them afterwards, but only reads and stores one line at a time, whenever the next line is needed. That way you improve the code's performance and resource efficiency.
If you write a list comprehension, but you don't need its result necessarily as list because you simply want to iterate over it using a for loop, it's more efficient to use a generator expression (same effect as the file object's line generator described above). The only syntactical difference between a list comprehension and a generator expression are the brackets. Replace [...] with (...) and you have a generator. The only downside of a generator is that you neither can find out its length, nor can you access items directly using an index. As you don't need any of these features, the generator is fine here.
There is a simpler way to remove trailing newline characters from a line: line.rstrip() removes all trailing whitespaces. If you want to keep e.g. spaces, but only want the newline to be removed, pass that character as argument: line.rstrip("\n").
However, it could possibly be even easier and faster to just not add another implicit line break during the print call instead of removing it first to have it re-added later. You would suppress the line break of print in Python 2 by simply adding a comma at the end of the statement:
print >> out_file item.title(), file.count(item),
There is a type Counter to count occurrences of elements in a collection, which is faster and easier than writing it yourself, because you don't need the additional count() call for every element. The Counter behaves mostly like a dictionary with your items as keys and their count as values. Simply import it from the collections module and use it like this:
from collections import Counter
c = Counter(lines)
for item in c:
print item, c[item]
With all those suggestions (except the one not to remove the line breaks) applied and the variables renamed to something more clear, the optimized code looks like this:
from collections import Counter
with open("Only1ID.txt") as in_file, open("output.txt", "w") as out_file:
counter = Counter(line.lower().rstrip("\n") for line in in_file)
for item in sorted(counter):
print >> out_file item.title(), counter[item]

Related

i want to search file for three strings and type 'defect' only if those both strings are present

I have a txt file with three debug signature present on them.
x = 'task cocaLc Requested reboot'
y = 'memPartFree'
z = 'memPartAlloc'
import re
f = open('testfile.txt','r')
searchstrings = ('task cocaLc Requested reboot', 'memPartFree', 'memPartAlloc')
for line in f():
for word in searchstrings:
if any (s in line for s in searchstrings):
print 'defect'
I want to create a short script to scan through the file and print 'defect' only if all these three strings are present.
I was trying creating with different ways, but unable to meet the requirement.
First, there is a small error on line 4 of the example code. f is not callable, and thus you shouldn't be using parenthesis next to it.
If you have a file with the following in it:
task cocaLc Requested reboot
memPartFree
memPartAlloc
It will print out "defect" 9 times because you're checking once for each line, and once for each search string. So three lines, times three search strings is 9.
The any() function will return True any time the file contains at least one of the defined search strings. Thus, this code will print out "defect" once for each line, multiplied by the number of search strings you've defined.
To resolve this, the program will need to know if/when any of the particular search strings have been detected. You might do something like this:
f = open('testfile.txt','r')
searchstrings = ['task cocaLc Requested reboot', 'memPartFree', 'memPartAlloc']
detections = [False, False, False]
for line in f:
for i in range(0, len(searchstrings)):
if searchstrings[i] in line: #loop through searchstrings using index numbers
detections[i] = True
break #break out of the loop since the word has been detected
if all(detections): #if every search string was detected, every value in detections should be true
print "defect"
In this code, we loop through the lines and the search strings, but the detection variable serves to tell us which search strings have been detected in the file. Thus, if all elements in that list are true, that means all of the search strings have been detected in the file.

Facing issue with for loop

I am trying to get this function to read an input file and output the lines from the input file into a new file. Pycharm keeps saying 'item' is not being used or it was used in the first for loop. I don't see why 'item' is a problem. It also won't create the new file.
input_list = 'persist_output_input_file_test.txt'
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
input_file.close()
for item in input_list:
write_new_file = open('output_word.txt', 'wb')
for item in lines:
print>>input_list, item
write_new_file.close()
You have a few things going wrong in your program.
input_list seems to be a string denoting the name of a file. Currently you are iterating over the characters in the string with for item in input_list.
You shadow the already created variable item in your second for loop. I recommend you change that.
In Python, depending on which version you use, the correct syntax for printing a statement to the screen is print text(Python 2) or print(text)(Python 3). Unlike c++'s std::cout << text << endl;. << and >> are actually bit wise operators in Python that shift the bits either to the left or to the right.
There are a few issues in your implementation. Refer the following code for what you intend to do:
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
write_new_file = open('output_word.txt', 'wb')
input_file.close()
for item in lines:
print item
write_new_file.write(item);
The issues with your earlier implementation are as follows:
In the first loop you are iterating in the input file name. If you intend to keep input_list a list of input files to be read, then you will also have to open them. Right now, the loop iterates through the characters in the input file name.
You are opening the output file in a loop. So, Only the last write operation will be successful. You would have to move the the file opening operation outside the loop(Ref: above code snippet) or edit the mode to 'append'. This can be done as follows:
write_new_file = open('output_word.txt', 'a')
There is a syntax error with the way you are using print command.
f=open('yourfilename','r').read()
f1=f.split('\n')
p=open('outputfilename','w')
for i in range (len(f1)):
p.write(str(f1[i])+'\n')
p.close()
hope this helps.

how to write simultaneously in a file while the program is still running

In simple words I have a file which contains duplicate numbers. I want to write unique numbers from the 1st file into a 2nd file. I have opened the 1st file in 'r' mode and the 2nd file in 'a+' mode. But it looks like that nothing is appended in the 2nd file while the program is running which gives wrong output. Any one can help me how do I fix this problem.
Thank you in advance.
This is my code
#!/usr/bin/env python
fp1 = open('tweet_mention_id.txt','r')
for ids in fp1:
ids = ids.rstrip()
ids = int(ids)
print 'ids= ',ids
print ids + 1
fp2 = open('unique_mention_ids.txt','a+')
for user in fp2:
user = user.rstrip()
user = int(user)
print user + 1
print 'user= ',user
if ids != user:
print 'is unique',ids
fp2.write(str(ids) + '\n')
break
else:
print 'is already present',ids
fp2.close()
fp1.close()
If unique_mention_ids.txt is initially empty, then you will never enter your inner loop, and nothing will get written. You should use the inner loop to determine whether or not the id needs to be added, but then do the addition (if warranted) outside the inner loop.
Similar logic applies for a non-empty file, but for a different reason: when you open it for appending, the file pointer is at the end of the file, and trying to read behaves as if the file were empty. You can start at the beginning of the file by issuing a fp2.seek(0) statement before the inner loop.
Either way: as written, you will write a given id from the first file for every entry in the second that it doesn't match, as opposed to it not matching any (which, given the file name, sounds like what you want). Worse, in the second case above, you will be over writing whatever came after the id that didn't match.

IndexError: list index out of range for list of lists in for loop

I've looked at the other questions posted on the site about index error, but I'm still not understanding how to fix my own code. Im a beginner when it comes to Python. Based on the users input, I want to check if that input lies in the fourth position of each line in the list of lists.
Here's the code:
#create a list of lists from the missionPlan.txt
from __future__ import with_statement
listoflists = []
with open("missionPlan.txt", "r") as f:
results = [elem for elem in f.read().split('\n') if elem]
for result in results:
listoflists.append(result.split())
#print(listoflists)
#print(listoflists[2][3])
choice = int(input('Which command would you like to alter: '))
i = 0
for rows in listoflists:
while i < len(listoflists):
if listoflists[i][3]==choice:
print (listoflists[i][0])
i += 1
This is the error I keep getting:
not getting inside the if statement
So, I think this is what you're trying to do - find any line in your "missionPlan.txt" where the 4th word (after splitting on whitespace) matches the number that was input, and print the first word of such lines.
If that is indeed accurate, then perhaps something along this line would be a better approach.
choice = int(input('Which command would you like to alter: '))
allrecords = []
with open("missionPlan.txt", "r") as f:
for line in f:
words = line.split()
allrecords.append(words)
try:
if len(words) > 3 and int(words[3]) == choice:
print words[0]
except ValueError:
pass
Also, if, as your tags suggest, you are using Python 3.x, I'm fairly certain the from __future__ import with_statement isn't particularly necessary...
EDIT: added a couple lines based on comments below. Now in addition to examining every line as it's read, and printing the first field from every line that has a fourth field matching the input, it gathers each line into the allrecords list, split into separate words as a list - corresponding to the original questions listoflists. This will enable further processing on the file later on in the code. Also fixed one glaring mistake - need to split line into words, not f...
Also, to answer your "I cant seem to get inside that if statement" observation - that's because you're comparing a string (listoflists[i][3]) with an integer (choice). The code above addresses both that comparison mismatch and the check for there actually being enough words in a line to do the comparison meaningfully...

For loop using enumerate through a list with an if statement to search lines for a particular string

I am going to compile a list of a recurring strings (transaction ID).
I am flummoxed. I've researched the correct method and feel like this code should work.
However, I'm doing something wrong in the second block.
This first block correctly compiles a list of the strings that I want.
I cant get this second block to work. If I simplify, I can print each value in the list
by using
for idx, val in enumerate(tidarray): print val
It seems like I should now be able to use that value to search each line for that string,
then print the line (actually I'll be using it in conjunction with another search term to
reduce the number of line reads, but this is my basic test before honing in further.
def main():
pass
samlfile= "2013-08-18 06:24:27,410 tid:5af193fdc DEBUG org.sourceid.saml20.domain.AttributeMapping] Source attributes:{SAML_AUTHN_CTX=urn:oasis:names:tc:SAML:2.0:ac:classes"
tidarray = []
for line in samlfile:
if "tid:" in line:
str=line
tid = re.search(r'(tid:.*?)(?= )', str)
if tid.group() not in tidarray:
tidarray.append(tid.group())
for line in samlfile:
for idx, val in enumerate(tidarray):
if val in line:
print line
Can someone suggest a correction for the second block of code? I recognize that reading the file twice isn't the most elegant solution... My main goal here is to learn how to enumerate through the list and use each value in the subsequent code.
Iterating over a file twice
Basically what you do is:
for line in somefile: pass # first run
for line in somefile: pass # second run
The first run will complete just fine, the second run will not run at all.
This is because the file was read until the end and there's no more data to read lines from.
Call somefile.seek(0) to go to the beginning of the file:
for line in somefile: pass # first run
somefile.seek(0)
for line in somefile: pass # second run
Storing things uniquely
Basically, what you seem to want is a way to store the IDs from the file in the a
data structure and every id shall only be once in said structure.
If you want to store elements uniquely you use, for example, dictionaries (help(dict))
or sets (help(set)). Example with sets:
myset = set()
myset.add(2) # set([2])
myset.add(3) # set([2,3])
myset.add(2) # set([2,3])