Can't generate proper file in python - python-2.7

I'm trying to generate a new file based on an existing one containing only lines with some predefined text. I have:
with open("steps_shown_at_least_once.log", "r") as f:
for line in f:
if line.find("Run program"):
output = open('run_studio.txt', 'a')
output.write(line)
output.close()
for some reason this generates an identical file. However the Run program that I'm searching for is not located in every row of the old file.

line.find('Run program') returns the index of a the string.
Return Value
Index if found and -1 otherwise.
Found here: Python String find() Method
Instead of line.find("Run program"): write if "Run program" in line:

Related

Null Byte appending while reading the file through Python pandas

I have created a script which will give you the match rows between the two files. Post that, I am returning the output file to a function, which will be used the file as input to create pivot using pandas.
But somehow, something seems to be wrong, below is the code snippet
def CreateSummary(file):
out_file = file
file_df = pd.read_csv(out_file) ## This function is appending NULL Bytes at
the end of the file
#print file_df.head(2)
The above code is giving me the error as
ValueError: No columns to parse from file
Tried another approach:
file_df = pd.read_csv(out_file,delim_whitespace=True,engine='python')
##This gives me error as
_csv.Error: line contains NULL byte
Any suggestions and criticism is highly appreciated.

Hello I have a code that prints what I need in python but i'd like it to write that result to a new file

The file look like a series of lines with IDs:
aaaa
aass
asdd
adfg
aaaa
I'd like to get in a new file the ID and its occurrence in the old file as the form:
aaaa 2
asdd 1
aass 1
adfg 1
With the 2 element separated by tab.
The code i have print what i want but doesn't write in a new file:
with open("Only1ID.txt", "r") as file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print item.title(), file.count(item)
As you use Python 2, the simplest approach to convert your console output to file output is by using the print chevron (>>) syntax which redirects the output to any file-like object:
with open("filename", "w") as f: # open a file in write mode
print >> f, "some data" # print 'into the file'
Your code could look like this after simply adding another open to open the output file and adding the chevron to your print statement:
with open("Only1ID.txt", "r") as file, open("output.txt", "w") as out_file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print >> out_file item.title(), file.count(item)
However, your code has a few other more or less bad things which one should not do or could improve:
Do not use the same variable name file for both the file object returned by open and your processed list of strings. This is confusing, just use two different names.
You can directly iterate over the file object, which works like a generator that returns the file's lines as strings. Generators process requests for the next element just in time, that means it does not first load the whole file into your memory like file.readlines() and processes them afterwards, but only reads and stores one line at a time, whenever the next line is needed. That way you improve the code's performance and resource efficiency.
If you write a list comprehension, but you don't need its result necessarily as list because you simply want to iterate over it using a for loop, it's more efficient to use a generator expression (same effect as the file object's line generator described above). The only syntactical difference between a list comprehension and a generator expression are the brackets. Replace [...] with (...) and you have a generator. The only downside of a generator is that you neither can find out its length, nor can you access items directly using an index. As you don't need any of these features, the generator is fine here.
There is a simpler way to remove trailing newline characters from a line: line.rstrip() removes all trailing whitespaces. If you want to keep e.g. spaces, but only want the newline to be removed, pass that character as argument: line.rstrip("\n").
However, it could possibly be even easier and faster to just not add another implicit line break during the print call instead of removing it first to have it re-added later. You would suppress the line break of print in Python 2 by simply adding a comma at the end of the statement:
print >> out_file item.title(), file.count(item),
There is a type Counter to count occurrences of elements in a collection, which is faster and easier than writing it yourself, because you don't need the additional count() call for every element. The Counter behaves mostly like a dictionary with your items as keys and their count as values. Simply import it from the collections module and use it like this:
from collections import Counter
c = Counter(lines)
for item in c:
print item, c[item]
With all those suggestions (except the one not to remove the line breaks) applied and the variables renamed to something more clear, the optimized code looks like this:
from collections import Counter
with open("Only1ID.txt") as in_file, open("output.txt", "w") as out_file:
counter = Counter(line.lower().rstrip("\n") for line in in_file)
for item in sorted(counter):
print >> out_file item.title(), counter[item]

Facing issue with for loop

I am trying to get this function to read an input file and output the lines from the input file into a new file. Pycharm keeps saying 'item' is not being used or it was used in the first for loop. I don't see why 'item' is a problem. It also won't create the new file.
input_list = 'persist_output_input_file_test.txt'
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
input_file.close()
for item in input_list:
write_new_file = open('output_word.txt', 'wb')
for item in lines:
print>>input_list, item
write_new_file.close()
You have a few things going wrong in your program.
input_list seems to be a string denoting the name of a file. Currently you are iterating over the characters in the string with for item in input_list.
You shadow the already created variable item in your second for loop. I recommend you change that.
In Python, depending on which version you use, the correct syntax for printing a statement to the screen is print text(Python 2) or print(text)(Python 3). Unlike c++'s std::cout << text << endl;. << and >> are actually bit wise operators in Python that shift the bits either to the left or to the right.
There are a few issues in your implementation. Refer the following code for what you intend to do:
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
write_new_file = open('output_word.txt', 'wb')
input_file.close()
for item in lines:
print item
write_new_file.write(item);
The issues with your earlier implementation are as follows:
In the first loop you are iterating in the input file name. If you intend to keep input_list a list of input files to be read, then you will also have to open them. Right now, the loop iterates through the characters in the input file name.
You are opening the output file in a loop. So, Only the last write operation will be successful. You would have to move the the file opening operation outside the loop(Ref: above code snippet) or edit the mode to 'append'. This can be done as follows:
write_new_file = open('output_word.txt', 'a')
There is a syntax error with the way you are using print command.
f=open('yourfilename','r').read()
f1=f.split('\n')
p=open('outputfilename','w')
for i in range (len(f1)):
p.write(str(f1[i])+'\n')
p.close()
hope this helps.

how to write simultaneously in a file while the program is still running

In simple words I have a file which contains duplicate numbers. I want to write unique numbers from the 1st file into a 2nd file. I have opened the 1st file in 'r' mode and the 2nd file in 'a+' mode. But it looks like that nothing is appended in the 2nd file while the program is running which gives wrong output. Any one can help me how do I fix this problem.
Thank you in advance.
This is my code
#!/usr/bin/env python
fp1 = open('tweet_mention_id.txt','r')
for ids in fp1:
ids = ids.rstrip()
ids = int(ids)
print 'ids= ',ids
print ids + 1
fp2 = open('unique_mention_ids.txt','a+')
for user in fp2:
user = user.rstrip()
user = int(user)
print user + 1
print 'user= ',user
if ids != user:
print 'is unique',ids
fp2.write(str(ids) + '\n')
break
else:
print 'is already present',ids
fp2.close()
fp1.close()
If unique_mention_ids.txt is initially empty, then you will never enter your inner loop, and nothing will get written. You should use the inner loop to determine whether or not the id needs to be added, but then do the addition (if warranted) outside the inner loop.
Similar logic applies for a non-empty file, but for a different reason: when you open it for appending, the file pointer is at the end of the file, and trying to read behaves as if the file were empty. You can start at the beginning of the file by issuing a fp2.seek(0) statement before the inner loop.
Either way: as written, you will write a given id from the first file for every entry in the second that it doesn't match, as opposed to it not matching any (which, given the file name, sounds like what you want). Worse, in the second case above, you will be over writing whatever came after the id that didn't match.

Length of Python dictionary created doesn't match length from input file

I'm currently trying to create a dictionary from the following input file:
1776344_at 1779734_at 0.755332745 1.009570769 -0.497209846
1776344_at 1771911_at 0.931592828 0.830039019 2.28101445
1776344_at 1777458_at 0.746306282 0.753624146 3.709120716
...
...
There are a total of 12552 lines in this file.
What I wanted to do is to create a dictionary where the first 2 columns are the keys and the rest are the values. This I've successfully done and it looks something like this:
1770449_s_at;1777263_at:0.825723773;1.188969175;-2.858979578
1772892_at;1772051_at:-0.743866602;-1.303847456;26.41464414
1777227_at;1779218_s_at:0.819554413;0.677758609;4.51390617
But here's THE THING: I ran my python script on ms-dos cmd, and the generated output not only does not have the same sequence as that in the input file (i.e. 1st line is the 34th line), the whole file only has 739 lines.
Can someone enlighten me on what's going on? Is it something to do with memory? Cos the last I check I still have 305GB of disk space.
The script I wrote is as follow:
import sys
import os
input_file = sys.argv[1]
infile = open(input_file, 'r')
model_dict = {}
for line in infile:
key = ';'.join(line.split('\t')[0:2]).rstrip(os.linesep)
value = ';'.join(line.split('\t')[2:]).rstrip(os.linesep)
print 'keys are:',key,'\n','values are:',value
model_dict[key] = value
print model_dict
outfile = open('model_dict', 'w')
for key,value in model_dict.items():
print key,value
outfile.write('%s:%s\n' % (key,value))
outfile.close()
Based on the information given and since each dictionary key is unique, i suspect you have in the input file, lines that are generating the same key. This way the dictionary will only hold the last value associated with that key.
Python dictionaries are unordered set of key: value pairs. So when you print it's elements to the output file, don't expect that the order is preserved.
Another problem i see in your script is the loop that prints the output file, that shouldn't be "inside" the loop that reads from the input file.