Simple python 'if' statement - if-statement

I'm trying to filter a large tab delimited file and print out just the lines with a score of >0.999 in one of the columns, but for some reason script's output continues to just print every line. Any insights as to why my "if score > 0.999:" isn't working as intended?
import sys
import string
import re
def split_lines(lines):
for line in lines:
if line.find('#') >-1:
print line
else:
#pass
#fields = re.split('\t',line)
fields = line.split('\t')
score = fields[3]
if score > 0.999:
print score
#else:
# pass
data = sys.stdin.read()
lines = data.split('\n')
split_lines(lines)

You need to convert the string score to a number format, Decimal or float
if float(score) > 0.999

Related

How to get PyPDF2 to extract text from multiple sequential pages - in range?

I'm trying to get PyPDF2 to extract specific text throughout a document per the code below. It is pulling exactly what I need and eliminating the duplicates, but it is not getting me a list from each page, it seems to only be showing me the text from the last page. What am I doing wrong?
#import PyPDF2 and set extracted text as the page_content variable
import PyPDF2
pdf_file = open('enme2.pdf','rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
#for loop to get number of pages and extract text from each page
for page_number in range(number_of_pages):
page = read_pdf.getPage(page_number)
page_content = page.extractText()
#initialize the user_input variable
user_input = ""
#function to get the AFE numbers from the pdf document
def get_afenumbers(Y):
#initialize the afe and afelist variables
afe = "A"
afelist = ""
x = ""
#while loop to get only 6 digits after the "A"
while True:
if user_input.upper().startswith("Y") == True:
#Return a list of AFE's
import re
afe = re.findall('[A][0-9]{6}', page_content)
set(afe)
print(set(afe))
break
else:
afe = "No AFE numbers found..."
if user_input.upper().startswith("N") == True:
print("HAVE A GREAT DAY - GOODBYE!!!")
break
#Build a while loop for initial question prompt (when Y or N is not True):
while user_input != "Y" and user_input != "N":
user_input = input('List AFE numbers? Y or N: ').upper()
if user_input not in ["Y","N"]:
print('"',user_input,'"','is an invalid input')
get_afenumbers(user_input)
#FIGURE OUT HOW TO EXTRACT FROM ALL PAGES AND NOT JUST ONE
I'm quite new to this, just learned about regex by a response to my question earlier today. Thanks for any help.
If you change a little, it seems works fine.
page_content="" # define variable for using in loop.
for page_number in range(number_of_pages):
page = read_pdf.getPage(page_number)
page_content += page.extractText() # concate reading pages.

Compare value in a column in one row of csv file to value in next row using python

I have been reading up on the csv.reader next but did not see a way to compare the values in a column from one row to the next. For instance, if my data looked like this in Maps.csv file:
County1 C:/maps/map1.pdf
County1 C:/maps/map2.pdf
County2 C:/maps/map1.pdf
County2 C:/maps/map3.pdf
County3 C:/maps/map3.pdf
County4 C:/maps/map2.pdf
County4 C:/maps/map4.pdf
If line two's county equals line one's county do something
The following code compares rows, I want to compare the county values between current and previous rows.
import csv.
f = open("Maps.csv", "r+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if pre_line == cur_line:
print "Matches"
pre_line = cur_line
except:
break
I know I can grab the current value (see below) but do not know how to grab previous value. Is this possible? If so, could someone please tell me how. On day three of trying to solve writing my script to append pdf files from a csv file and am about to toss my coffee cup at my monitor. I am breaking these down into smaller parts and using simpler data as pilot. My file is much larger. I was advised to focus on just one issue at a time when posting to this forum. This is my latest issue. It seems no matter what tack I take, I can't seem to read the data the way I want. Arrrggghhhhh.
CurColor = row[color]
Using python 2.7
You already know how to look up the previous row. Why not get the column you need from that row?
import csv.
f = open("Maps.csv", "r+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if pre_line[0] == cur_line[0]: # <-- compare first column
print "Matches"
pre_line = cur_line
except:
break
or more simply:
pre_line = ff.next()
for cur_line in ff:
if pre_line[0] == cur_line[0]: # <-- compare first column
print "Matches"
pre_line = cur_line
import csv
f = open("Maps.csv", "r+")
# Use delimiters to split each line into different elements
# In my example i used a comma. Your csv may have a different delimiter
# make sure the delimiter is a single character string though
# so no multiple spaces between "County1 C:/maps/map1.pdf"
# it should be something like "County1,C:/maps/map1.pdf"
ff = csv.reader(f, delimiter=',')
COUNTY_INDEX = 0
# each time ff.next() is called, it makes an array variable ['County1', 'C:/maps/map1.pdf ']
# since you want to compare the value in the first index, then you need to reference it like so
# the line below will set pre_line = 'County1'
pre_line = ff.next()[COUNTY_INDEX]
while(True):
try:
# the current line will be 'County1' or 'County2' etc...Depending on which line is read
cur_line = ff.next()[COUNTY_INDEX]
if pre_line == cur_line:
print "Matches"
pre_line = cur_line
except:
break

How to import ".xls" in python

I need to import ".xls" in python.
Can you please give me an idea for this.I must need to import only a excel file not ".csv" file.Because csv file neglects a leading zero's in number.
Here’s the code example:
import xlrd
def open_file(path):
#Open and read an Excel file
book = xlrd.open_workbook(path)
print book.nsheets # print number of sheets
print book.sheet_names() # print sheet names
first_sheet = book.sheet_by_index(0) # get the first worksheet
print first_sheet.row_values(0) # read a row
cell = first_sheet.cell(0,0) # read a cell
print cell
print cell.value
# read a row slice
print first_sheet.row_slice(rowx=0,
start_colx=0,
end_colx=2)
if __name__ == "__main__":
path = "test.xls"
open_file(path)

Changing all occurences of similar word in csv python

I want to replace one specific word, 'my' with 'your'. But seems my code can only change one appearance.
import csv
path1 = "/home/bankdata/levelout.csv"
path2 = "/home/bankdata/leveloutmodify.csv"
in_file = open(path1,"rb")
reader = csv.reader(in_file)
out_file = open(path2,"wb")
writer = csv.writer(out_file)
with open(path1, 'r') as csv_file:
csvreader = csv.reader(csv_file)
col_count = 0
for row in csvreader:
while row[col_count] == 'my':
print 'my is used'
row[col_count] = 'your'
#writer.writerow(row[col_count])
writer.writerow(row)
col_count +=1
let's say the sentences is
'my book is gone and my bag is missing'
the output is
your book is gone and my bag is missing
the second thing is I want to make it appear without comma separated:
print row
the output is
your,book,is,gone,and,my,bag,is,missing,
for the second problem, im still trying to find the correct one as it keeps giving me the same output with comma separated.
with open(path1) as infile, open(path2, "w") as outfile:
for row in infile:
outfile.write(row.replace(",", ""))
print row
it gives me the result:
your,book,is,gone,and,my,bag,is,missing
I send out this sentence to my Nao robot and the robot seems pronouncing awkwardly as there are commas in between each word.
I solved it by:
with open(path1) as infile, open(path2, "w") as outfile:
for row in infile:
outfile.write(row.replace(",", ""))
with open(path2) as out:
for row in out:
print row
It gives me what I want:
your book is gone and your bag is missing too
However, any better way to do it?

How to remove unwanted items from a parse file

from googlefinance import getQuotes
import json
import time as t
import re
List = ["A","AA","AAB"]
Time=t.localtime() # Sets variable Time to retrieve date/time info
Date2= ('%d-%d-%d %dh:%dm:%dsec'%(Time[0],Time[1],Time[2],Time[3],Time[4],Time[5])) #formats time stamp
while True:
for i in List:
try: #allows elements to be called and if an error does the next step
Data = json.dumps(getQuotes(i.lower()),indent=1) #retrieves Data from google finance
regex = ('"LastTradePrice": "(.+?)",') #sets parse
pattern = re.compile(regex) #compiles parse
price = re.findall(pattern,Data) #retrieves parse
print(i)
print(price)
except: #sets Error coding
Error = (i + ' Failed to load on: ' + Date2)
print (Error)
It will display the quote as: ['(number)'].
I would like it to only display the number, which means removing the brackets and quotes.
Any help would be great.
Changing:
print(price)
into:
print(price[0])
prints this:
A
42.14
AA
10.13
AAB
0.110
Try to use type() function to know the datatype, in your case type(price)
it the data type is list use print(price[0])
you will get the output (number), for brecess you need to check google data and regex.