Functions and Files - python-2.7

please help :)
I'm learning from pythonhardway
Exercise 20:Functions and Files
from sys import argv
script, input_file = argv
def print_all(f):
print f.read()
def rewind(f):
f.seek(0)
def print_a_line(line_count, f):
print line_count, f.readline()
current_file = open(input_file)
print "First let's print the whole file:\n"
print_all(current_file)
print "Now let's rewind, kind of like a tape."
rewind(current_file)
print "Let's print three lines:"
current_line = 1
print_a_line(current_line, current_file)
current_line = current_line + 1
print_a_line(current_line, current_file)
current_line = current_line + 1
print_a_line(current_line, current_file)
if current line = 2
it print the second line how ???!!!!!!!
def rewind(f):
f.seek(0)
& **rewind(current_file)**
why we put (f)
why not input_file ?!
i tried to explain what i think it is doing.
Sorry if i ask stupid questions :(

There are two issues at work here.
1 -- difference between name of file and file object you can operate on
2 -- difference between parameter names in functions and values used to call those functions
When you have a file named (e.g.) data.csv, you can refer to it using a string:
file_name = 'data.csv'
... but that's just a string of characters. To operate on the file with that name, you do:
fh = open(file_name)
You now have a file handle that can be used with file read, write, delete, seek etc. functions. So the first part of your answer is, file names are not the same things as file objects; so you would call the rewind function using current_file (the file handle) not input_file (the file name).
The other issue, perhaps obvious, is the placeholder parameter you use to define a function (like f in your example) is just a placeholder -- when you call that function later, the value you call it with is used.
So while the function is defined as rewind(f), when you call it later as rewind(current_file) you are in fact performing a seek on current_file -- not f, that's just a placeholder.
So in short -- you call seek (inside the function definition) using f because that's the placeholder / symbol name you chose to use for your reusable function (and that name could be changed to almost anything you want, in the function definition and function body) without affecting anything. And the reason you call rewind using current_file is that is the file object you opened to work on.
Does it make sense?

Related

Hello I have a code that prints what I need in python but i'd like it to write that result to a new file

The file look like a series of lines with IDs:
aaaa
aass
asdd
adfg
aaaa
I'd like to get in a new file the ID and its occurrence in the old file as the form:
aaaa 2
asdd 1
aass 1
adfg 1
With the 2 element separated by tab.
The code i have print what i want but doesn't write in a new file:
with open("Only1ID.txt", "r") as file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print item.title(), file.count(item)
As you use Python 2, the simplest approach to convert your console output to file output is by using the print chevron (>>) syntax which redirects the output to any file-like object:
with open("filename", "w") as f: # open a file in write mode
print >> f, "some data" # print 'into the file'
Your code could look like this after simply adding another open to open the output file and adding the chevron to your print statement:
with open("Only1ID.txt", "r") as file, open("output.txt", "w") as out_file:
file = [item.lower().replace("\n", "") for item in file.readlines()]
for item in sorted(set(file)):
print >> out_file item.title(), file.count(item)
However, your code has a few other more or less bad things which one should not do or could improve:
Do not use the same variable name file for both the file object returned by open and your processed list of strings. This is confusing, just use two different names.
You can directly iterate over the file object, which works like a generator that returns the file's lines as strings. Generators process requests for the next element just in time, that means it does not first load the whole file into your memory like file.readlines() and processes them afterwards, but only reads and stores one line at a time, whenever the next line is needed. That way you improve the code's performance and resource efficiency.
If you write a list comprehension, but you don't need its result necessarily as list because you simply want to iterate over it using a for loop, it's more efficient to use a generator expression (same effect as the file object's line generator described above). The only syntactical difference between a list comprehension and a generator expression are the brackets. Replace [...] with (...) and you have a generator. The only downside of a generator is that you neither can find out its length, nor can you access items directly using an index. As you don't need any of these features, the generator is fine here.
There is a simpler way to remove trailing newline characters from a line: line.rstrip() removes all trailing whitespaces. If you want to keep e.g. spaces, but only want the newline to be removed, pass that character as argument: line.rstrip("\n").
However, it could possibly be even easier and faster to just not add another implicit line break during the print call instead of removing it first to have it re-added later. You would suppress the line break of print in Python 2 by simply adding a comma at the end of the statement:
print >> out_file item.title(), file.count(item),
There is a type Counter to count occurrences of elements in a collection, which is faster and easier than writing it yourself, because you don't need the additional count() call for every element. The Counter behaves mostly like a dictionary with your items as keys and their count as values. Simply import it from the collections module and use it like this:
from collections import Counter
c = Counter(lines)
for item in c:
print item, c[item]
With all those suggestions (except the one not to remove the line breaks) applied and the variables renamed to something more clear, the optimized code looks like this:
from collections import Counter
with open("Only1ID.txt") as in_file, open("output.txt", "w") as out_file:
counter = Counter(line.lower().rstrip("\n") for line in in_file)
for item in sorted(counter):
print >> out_file item.title(), counter[item]

Facing issue with for loop

I am trying to get this function to read an input file and output the lines from the input file into a new file. Pycharm keeps saying 'item' is not being used or it was used in the first for loop. I don't see why 'item' is a problem. It also won't create the new file.
input_list = 'persist_output_input_file_test.txt'
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
input_file.close()
for item in input_list:
write_new_file = open('output_word.txt', 'wb')
for item in lines:
print>>input_list, item
write_new_file.close()
You have a few things going wrong in your program.
input_list seems to be a string denoting the name of a file. Currently you are iterating over the characters in the string with for item in input_list.
You shadow the already created variable item in your second for loop. I recommend you change that.
In Python, depending on which version you use, the correct syntax for printing a statement to the screen is print text(Python 2) or print(text)(Python 3). Unlike c++'s std::cout << text << endl;. << and >> are actually bit wise operators in Python that shift the bits either to the left or to the right.
There are a few issues in your implementation. Refer the following code for what you intend to do:
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
write_new_file = open('output_word.txt', 'wb')
input_file.close()
for item in lines:
print item
write_new_file.write(item);
The issues with your earlier implementation are as follows:
In the first loop you are iterating in the input file name. If you intend to keep input_list a list of input files to be read, then you will also have to open them. Right now, the loop iterates through the characters in the input file name.
You are opening the output file in a loop. So, Only the last write operation will be successful. You would have to move the the file opening operation outside the loop(Ref: above code snippet) or edit the mode to 'append'. This can be done as follows:
write_new_file = open('output_word.txt', 'a')
There is a syntax error with the way you are using print command.
f=open('yourfilename','r').read()
f1=f.split('\n')
p=open('outputfilename','w')
for i in range (len(f1)):
p.write(str(f1[i])+'\n')
p.close()
hope this helps.

stuck on basic regular expression

Task: To find all the numbers in a text file and compute the sum of it.
Link to file(if required) : http://python-data.dr-chuck.net/regex_sum_42.txt
name = raw_input("Enter your file: ")
if len(name) < 1: name = "sample.txt"
try:
open(name)
except:
print "Please enter a valid file name."
exit()
import re
lst = list()
for line in name:
line = line.strip() #strip() instead of rstrip() as there were space before line as well
stuff = re.findall("[0-9]+", line)
print stuff # i tried to trace back and realize it prints empty list so problem should be here
stuff = int(stuff[0]) # i think this is wrong as well
lst.append(stuff)
sum(lst)
print sum(lst)
Can someone tell me where did I go wrong ? sorry for any formatting errors and thanks for the help
I have also tried:
\s[0-9]+\s
.[0-9]+.
You need to change your code to:
lst = []
with open(name) as f:
for line in f:
stuff = [lst.append(int(x)) for x in re.findall("[0-9]+", line.strip())]
print sum(lst)
See the IDEONE demo
The problem was that you tried to parse an empty string in the first place. When parsing to int and appending to the list (declared with lst = []) inside comprehension, you avoid messing with empty output and the list you get is flattened automatically.
Also, you need to actually read the file in. "The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files." (source)

Automatic conversion from file to string after entering in a for loop?

v_file = open('numbers.txt','r')
print (type(v_file))
for v_i in v_file:
print (v_i.strip('\n'))
print (type(v_i))
Hey there... i'm just wondering how python knows to change automatically from a file type to a string type in this piece of code after entering the for loop.
In "numbers.txt" i have let's say:
Peter, 0908212
Joe, 9283812
L.T: It just knows and that is it?
I'm a bit unclear on what you are trying to accomplish, but I'm gonna assume those numbers are in the file. That being said, try:
content = v_file.read()
for line in content.split('\n'):
print line
## ... or whatever. Should return those numbers
Again, I'm assuming you are just iterating over an open file instance.
Hope that helps!

how to write simultaneously in a file while the program is still running

In simple words I have a file which contains duplicate numbers. I want to write unique numbers from the 1st file into a 2nd file. I have opened the 1st file in 'r' mode and the 2nd file in 'a+' mode. But it looks like that nothing is appended in the 2nd file while the program is running which gives wrong output. Any one can help me how do I fix this problem.
Thank you in advance.
This is my code
#!/usr/bin/env python
fp1 = open('tweet_mention_id.txt','r')
for ids in fp1:
ids = ids.rstrip()
ids = int(ids)
print 'ids= ',ids
print ids + 1
fp2 = open('unique_mention_ids.txt','a+')
for user in fp2:
user = user.rstrip()
user = int(user)
print user + 1
print 'user= ',user
if ids != user:
print 'is unique',ids
fp2.write(str(ids) + '\n')
break
else:
print 'is already present',ids
fp2.close()
fp1.close()
If unique_mention_ids.txt is initially empty, then you will never enter your inner loop, and nothing will get written. You should use the inner loop to determine whether or not the id needs to be added, but then do the addition (if warranted) outside the inner loop.
Similar logic applies for a non-empty file, but for a different reason: when you open it for appending, the file pointer is at the end of the file, and trying to read behaves as if the file were empty. You can start at the beginning of the file by issuing a fp2.seek(0) statement before the inner loop.
Either way: as written, you will write a given id from the first file for every entry in the second that it doesn't match, as opposed to it not matching any (which, given the file name, sounds like what you want). Worse, in the second case above, you will be over writing whatever came after the id that didn't match.