How do i print each line here in a for loop - python-2.7

thanks for the follow :)
hii... if u want to make a new friend just add me on facebook! :) xx
Just wanna say if you ever feel lonely or sad or bored, just come and talk to me. I'm free anytime :)
I hope she not a spy for someone. I hope she real on neautral side. Because just her who i trust. :-)
not always but sometimes maybe :)
\u201c Funny how you get what you want and pray for when you want the same thing God wants. :)
Thank you :) can you follow me on Twitter so I can DM you?
RT dj got us a fallin in love and yeah earth number one m\u00fcsic listen thank you king :-)
found a cheeky weekend for \u00a380 return that's flights + hotel.. middle of april, im still looking pal :)
RT happy birthday mary ! Hope you have a good day :)
Thank god twitters not blocked on the school computers cause all my data is gone on my phone :(
enjoy tmrro. saw them earlier this wk here in tokyo :)
UPDATE:
Oki, maybe my question was wrong. I have to do this:
Open file and read from it
Remove some links, names and stuff from it (I have used regex, but don't know if it the right way to do
After i got clean code (only tweets with sad face or happy face) i have to print each line out, cause i have to loop each like this:
for line in tweets:
if '' in line:
cl.train(line,'happy')
else if '' in line:
cl.train(line,'sad')
My code so far you see here, but it doesn't work yet.
import re
from pprint import pprint
tweets = []
tweets = open('englishtweet.txt').read()
regex_username = '#[^\s]*' # regex to detect username in file
regex_url = 'http[^\s]*' # regex to detect url in file
regex_names = '#[^\s]*' # regex to detect # in file
for username in re.findall(regex_username, tweets):
tweets = tweets.replace(username, '')
for url in re.findall(regex_url, tweets):
tweets = tweets.replace(url, '')
for names in re.findall(regex_names, tweets):
tweets = tweets.replace(names, '')

If you want to read the first line, use next
with open("englishtweet.txt","r") as infile:
print next(infile).strip()
# this prints the first line only, and consumes the first value from the
# generator so this:
for line in infile:
print line.strip()
# will print every line BUT the first (since the first has been consumed)
I'm also using a context manager here, which will automatically close the file once you exit the with block instead of having to remember to call tweets.close(), and also will handle in case of error (depending on what else you're doing in your file, you may throw a handled exception that doesn't allow you to get to the .close statement).
If your file is very small, you could use .readlines:
with open("englishtweet.txt","r") as infile:
tweets = infile.readlines()
# tweets is now a list, each element is a separate line from the file
print tweets[0] # so element 0 is the first line
for line in tweets[1:]: # the rest of the lines:
print line.strip()
However that's not really suggested to read a whole file object into memory, as with some files it can simply be a huge memory waster, especially if you only need the first line -- no reason to read the whole thing to memory.
That said, since it looks like you may be using these for more than just one iteration, maybe readlines IS the best approach

You almost have it. Just remove the .read() when you originally open the file. Then you can loop through the lines.
tweets = open('englishtweet.txt','r')
for line in tweets:
print line
tweets.close()

Related

Python iterate through txt file of dates and change date format

I have a txt file oldDates.txt. I want to loop through it, modify the date formats and write the newly formatted dates to a new txt file. My code so far:
from datetime import datetime
f = open('oldDates.txt', r)
oldDates = []
newDates = []
for line in f.readlines():
oldDates.append(line)
print(line) # for testing
for oldDate in oldDates:
dt = datetime.strptime(oldDate, '%d/%m/%Y').strftime('%d/%m/%Y')
newDates.append(dt)
with open('newDates.txt', 'w') as w:
for newDate in newDates:
w.write(newDate+"\n")
f.close()
w.close()
However, this give an error:
ValueError: unconverted data remains
I'm not sure where I'm going wrong here, and if there's a more efficient way of doing this then I'd be glad to hear about it. The date conversion seems to work fine from the test print.
There are blank lines in the file and I'm wondering if I need to handle these (I'm not sure how).
Any help much appreciated!
Now, I am no expert in Python, but do you verify your input to be the correct format ? If you apply a regexp on the input line you can easily catch blank lines and any incorrect values.

Python how to batch printing x lines at a time in a for loop

I tried all sorts of for loops but just can't seem to figure out how to print "n" number of lines from a dictionary at a time. I am new to programming so please pardon my terminology and expressions...
Example source:
{'majorkey1': [{'name':'j','age':'3','height':'6feet'},
{'name':'r','age':'4','height':'5feet'},
{'name':'o','age':'5','height':'3feet'}],
'majorkey2':[{'name':'n','age':'6','height':'4feet'},
{'name':'s','age':'7','height':'7feet'},
{'name':'q','age':'7','height':'8feet'}]}
This prints everything at once (undesired):
for majorkey in readerObj.keys():
for idx, line in enumerate(readerObj.get(majorkey)):
print line
{'name':'j','age':'3','height':'6feet'}
{'name':'r','age':'4','height':'5feet'}
{'name':'o','age':'5','height':'3feet'}
{'name':'n','age':'6','height':'4feet'}
{'name':'s','age':'7','height':'7feet'}
{'name':'q','age':'7','height':'8feet'}
I have gutted a lot of code to make this easier to read. The behaviour I would like is to print according to the number of lines specified. For now I will just use lines_to_execute=2. I would like to keep code as close as possible to minimize me rewriting this block. From this answer once working I will modify code so that it performs something chunks at a time.
Code block I want to stay close to:
Ill mix psudo code here as well
for majorkey in readerObj.keys():
lines_to_execute = 2
start_idx_position = 0
range_to_execute = lines_to_execute
for idx[start_idx_position:range_to_execute], line in enumerate(readerObj.get(majorkey)):
print line
increment start_idx_position by lines_to_execute
increment range_to_execute by lines_to_execute
time.sleep(1)
For this example if I want to print two lines or rows at a time, output would look like the below. Order is not important as same 2 don't get executed more than once:
Desired output:
{'name':'j','age':'3','height':'6feet'}
{'name':'r','age':'4','height':'5feet'}
One second delay...
{'name':'o','age':'5','height':'3feet'}
{'name':'n','age':'6','height':'4feet'}
One second delay.
{'name':'s','age':'7','height':'7feet'}
{'name':'q','age':'7','height':'8feet'}
I hope this is enough information to go on.
from pprint import pprint
import time
for key in obj.keys():
lines_to_execute = 2
pprint(obj[key][:lines_to_execute]) # that's all you need
time.sleep(1)
Keep it as simple as possible.

Automatic conversion from file to string after entering in a for loop?

v_file = open('numbers.txt','r')
print (type(v_file))
for v_i in v_file:
print (v_i.strip('\n'))
print (type(v_i))
Hey there... i'm just wondering how python knows to change automatically from a file type to a string type in this piece of code after entering the for loop.
In "numbers.txt" i have let's say:
Peter, 0908212
Joe, 9283812
L.T: It just knows and that is it?
I'm a bit unclear on what you are trying to accomplish, but I'm gonna assume those numbers are in the file. That being said, try:
content = v_file.read()
for line in content.split('\n'):
print line
## ... or whatever. Should return those numbers
Again, I'm assuming you are just iterating over an open file instance.
Hope that helps!

Removing Duplicate Lines by Title Only

I am trying to modify a script so that it will remove duplicate lines from a text file using only the title portion of that line.
To clarify the text file lines look something like this:
Title|Image Url|Description|Page Url
At the moment the script does remove duplicates, but it does so by reading the entire line, not just the first part. All the lines in the file are not going to be 100% the same, but a few will be very similar.
I want to remove all of the lines that contain the same "title", regardless of what the rest of the line contains.
This is the script I am working with:
import sys
from collections import OrderedDict
infile = "testfile.txt"
outfile = "outfile.txt"
inf = open(infile,"r")
lines = inf.readlines()
inf.close()
newset = list(OrderedDict.fromkeys(lines))
outf = open(outfile,"w")
lstline = len(newset)
for i in range(0,lstline):
ln = newset[i]
outf.write(ln)
outf.close()
So far I have tried using .split() to split the lines in the list. I have also tried .readline(lines[0:25]) in hopes of using a character limit to achieve the desired results, but no luck so far. I also can't seem to find any documentation on my exact problem so I'm stuck.
I am using Windows 8 and Python 2.7.9 for this project if that helps.
I made a few changes to the program you had set up. First, I changed your file interactions to use "with" statements, since those are very convenient and automatically handle a lot of the functionality you had to write out. Second off, I used a set instead of an OrderedDict because you were basically just trying to emulate set functionality (exclusivity of elements) by using keys in an OrderedDict. If the title hasn't been used, it adds it to the set so it can't be used again and prints the line to the output file. If it has been used, it keeps going. I hope this helps you!
with open("testfile.txt") as infile:
with open("outfile.txt",'w') as outfile:
titleset = set()
for line in infile:
title = line.split('|')[0]
if title not in titleset:
titleset.add(title)
outfile.write(line)

how to improve the speed of the python script

I'm very new to python. I'm working in the area of hydrology and I want to learn python to assist me with processing hydrological data.
At the moment I write a script to extract bits of information from a big data set. I have three csv files:
Complete_borelist.csv
Borelist_not_interested.csv
Elevation_info.csv
I want to create a file with has all the bores that are in complete_borelist.csv but not in borelist_not_interested.csv. I also want to grab some information from complete_borelist.csv and Elevation_info.csv for those bores which satisfy the first criteria.
My script is as follow:
not_interested_list =[]
outfile1 = open('output.csv','w')
outfile1.write('Station_ID,Name,Easting,Northing,Location_name,Elevation')
outfile1.write('\n')
with open ('Borelist_not_interested.csv','r') as f1:
for line in f1:
if not line.startswith('Station'): #ignore header
line = line.rstrip()
words = line.split(',')
station = words[0]
not_interested_list.append(station)
with open('Complete_borelist.csv','r') as f2:
next(f2) #ignore header
for line in f2:
line= line.rstrip()
words = line.split(',')
station = words[0]
if not station in not_interested_list:
loc_name = words[1]
easting = words[4]
northing = words[5]
outfile1.write(station+','+easting+','+northing+','+loc_name+',')
with open ('Elevation_info.csv','r') as f3:
next(f3) #ignore header
for line in f3:
line = line.rstrip()
data = line.split(',')
bore_id = data[0]
if bore_id == station:
elevation = data[4]
outfile1.write(elevation)
outfile1.write ('\n')
outfile1.close()
I have two issues with the script:
The first is the Elevation_info.csv doesn't have information for all the bore in the Complete_borelist.csv. When my loop get to the station where it can't find Elevation record for it, the script doesn't write "null" but continue to write the information for the next station in the same line. Can anyone help me to fix this please?
The second is my complete borelist is about >200000 rows and my script runs through them very slow. Can anyone have any suggestion to make it run faster?
Very much appreciated and sorry if my question is too long.
performance-wise, this has a couple of problems. The first one is that you are opening and re-reading the Elevation info for every line of the complete file.. Read the elevation info into a dictionary keyed upon the bore_id before you open the complete file. Then you can test the dictionary very fast to see if station is in it instead of re-reading.
The second performance issue is that you don't stop searching in the bore_id list once you find a match. The dictionary idea solves that too, but otherwise a break once you have a match would help a little.
For the null printing problem, you just need to outfile1.write("\n") if the bore_id is not in the dictionary. An else statement on the dictionary test does that. In the current code, an else closing the for loop would do it. Or even changing the indentation of that last write("\n").