I have a text file and some where in that file I have a line with the following content
xxxx xxxx y = 4.63456
where xxxx represents the part the line I am not interested in. My goal is to extract that y = 4.63456 value and write it to a new text file. Here is what I have so far.
import os
import re
my_absolute_path = os.path.abspath(os.path.dirname(__file__))
with open('testfile', 'r') as helloFile, open('newfile32','a') as out_file:
for line in helloFile:
numbertocheck= []
if 'x' in line and ' = ' in line and numbertocheck==type(float) in line:
out_file.write(line)
The code creates the file but the file is empty. Is this the right way of checking for the conditions in the if statement? FWIW, if i removed the two conditionals at the end and wrote if x in line: the code works fine but prints out the entire line.
numbertocheck==type(float) is probably your culprit. When you get to that point in the code, numbertocheck is a list, and so it will never be of type float. You can check this by running type(numbertocheck).
It's unclear what you're wanting to do with the empty list numbertocheck and why you need it to be float. If you offer more info we can probably better guide you.
EDIT based on comment:
Let's make the assumption that your desire with numbertocheck is to check whether the 4.63456 value within xxxx xxxx y = 4.63456 is a valid float or not.
You need to extract the float value from the line string. Regex is suited for this, but the exact answer depends on what you know or don't know about possible values in a given line.
If you're certain the number will always have a decimal point in it,
import re
numbertocheck = re.findall("\d+\.\d+", line)
will extract the "float" value. If you can't guarantee a decimal point, use re.findall(r"[-+]?\d*\.\d+|\d+", line) instead.
This may return multiple numbers if they exist in your line. If it's possible for there to be numbers elsewhere in your line (such as in the xxxx xxxx part), and if you only care about the number at the end of your line, then this might work.
for line in helloFile:
numbers_in_line = re.findall(r"[-+]?\d*\.\d+|\d+", line)
# Make sure a number was found
if len(numbers_in_line) == 0:
continue
else:
try:
last_num = float(numbers_in_line[-1])
except ValueError:
continue
# By this point, you've confirmed if a number was found and if it's a float
if 'x' in line and ' = ' in line:
out_file.write(line)
Note that this will allow numbers without decimal points to count as a valid number; you can modify it if needed.
Related
I have a text file that I would like to search through it to see how many of a certain word is in it. I'm getting the wrong count for the words.
File is here
code:
import re
with open('SysLog.txt', 'rt') as myfile:
for line in myfile:
m = re.search('guest', line, re.M|re.I)
if m is not None:
m.group(0)
print( "Found it.")
print('Found',len(m.group()), m.group(),'s')
break
for line in myfile:
n = re.search('Worm', line)
if n is not None:
n.group(0)
print("\n\tNext Match.")
print('Found', len(n.group()), n.group(), 's')
break
for line in myfile:
o = re.search('anonymous', line)
if o is not None:
o.group(0)
print("\n\tNext Match.")
print('Found', len(o.group()), o.group(), 's')
break
There is no need to use a regex, you can use str.count() to make the process much more simple:
with open('SysLog.txt', 'rt') as myfile:
text = myfile.read()
for word in ('guest', 'Worm', 'anonymous'):
print("\n\tNext Match.")
print('Found', text.count(word), word, 's')
To test this, I downloaded the file and ran the code above, and got the output:
Next Match.
Found 4 guest s
Next Match.
Found 91 Worm s
Next Match.
Found 18 anonymous s
which is correct if you do a find on the document in a text editor!
*As a sidenote, I'm not sure why you want to print a tab (\t) before 'Next Match' each time as it just looks weird in the output but it doesn't matter :)
There are multiple problems with your code:
re.search will only give you the first match, if any; this does not have to be a problem, though, as it seems like the word is only expected to appear once per line; otherwise, use re.findall
the line n.group(0) does not do anything without an assignment
len(n.group()) does not give you the number of matches, but the length of the matched string
you break after the first line in the file
myfile is an iterator, so once the first for line in myfile loop has finished, the other two won't have any lines left to loop (it will never finish because of the break anyway, though)
as already noted, you do not need regular expression at all
One (among many) possible ways of doing this would be this (not tested):
counts = {"worm": 0, "guest": 0, "anonymous": 0}
for line in myfile:
for word in counts:
if word in line:
counts[word] += 1
I am hoping to receive some feedback on some code I have written in Python 3 - I am attempting to write a program that reads an input file which has page numbers in it. The page numbers are formatted as: "[13]" (this means you are on page 13). My code right now is:
pattern='\[\d\]'
for line in f:
if pattern in line:
re.sub('\[\d\]',' ')
re.compile(line)
output.write(line.replace('\[\d\]', ''))
I have also tried:
for line in f:
if pattern in line:
re.replace('\[\d\]','')
re.compile(line)
output_file.write(line)
When I run these programs, a blank file is created, rather than a file containing the original text minus the page numbers. Thank you in advance for any advice!
Your if statement won't work because not doing a regex match, it's looking for the literal string \[\d\] in line.
for line in f:
# determine if the pattern is found in the line
if re.match(r'\[\d\]', line):
subbed_line = re.sub(r'\[\d\]',' ')
output_file.writeline(subbed_line)
Additionally, you're using the re.compile() incorrectly. The purpose of it is to pre-compile your pattern into a function. This improves performance if you use the pattern a lot because you only evaluate the expression once, rather than re-evaluating each time you loop.
pattern = re.compile(r'\[\d\]')
if pattern.match(line):
# ...
Lastly, you're getting a blank file because you're using output_file.write() which writes a string as the entire file. Instead, you want to use output_file.writeline() to write lines to the file.
You don't write unmodified lines to your output.
Try something like this
if pattern in line:
#remove page number stuff
output_file.write(line) # note that it's not part of the if block above
That's why your output file is empty.
I have opened a text file in Python which has thousands of lines. I need to search each line to see if it contains 1 of many different specified values. I then need to return the specific value and the corresponding line that contains that value.
q1 = open('/home/lost/StockRec/StockIndex/edgar.full-index.2015.QTR1.master.idx', 'r')
list = ['1341234', '12341234', '4563456', '12341234', '6896786', '2727638']
for line in q1:
for listValue in list:
if listValue in line:
print(listValue, line)
I know this code is wrong. I need to search each line in q1 for each of the specific values in the list. I need to then print the specific list value and the line containing that value.
Unless your file is already somehow separated into lines, it looks like you will have to first split the file into lines when you import it. Right now it is returning all of it because q1 is only one line.
Look for some identifying information in your file such as new line characters ('\n') or if each line starts with a specific character.
so once you open the file you will include:
q1.split('your identifying character here')
That will split the copy of your file then you can perform the loops that you have already written
#returns same result i.e. only the first line as many times as 'draws'
infile = open("results_from_url.txt",'r')
file =infile.read() # essential to get correct formatting
for line in islice(file, 0, draws): # allows you to limit number of draws
for line in re.split(r"Wins",file)[1].split('\n'):
mains.append(line[23:38]) # slices first five numbers from line
stars.append(line[39:44]) # slices last two numbers from line
infile.close()
I am trying to use the above code to iterate through a list of numbers to extract the bits of interest. In this attempt to learn how to use regular expressions in Python 3, I am using lottery results opened from the internet. All this does is to read one line and return it as many times as I instruct in the value of 'draws'. Could someone tell me what I have done incorrectly, please. Does re 'terminate' somehow? The strange thing is if I copy the file into a string and run this routine, it works. I am at a loss - problem 'reading' a file or in my use of the regular expression?
I can't tell you why your code doesn't work, because I cannot reproduce the result you're getting. I'm also not sure what the purpose of
for line in islice(file, 0, draws):
is, because you never use the line variable after that, you immediately overwrite it with
for line in re.split(r"Wins",file)[1].split('\n'):
Plus, you could have used file.split('Wins') instead of re.split(r"Wins",file), so you aren't really using regex at all.
Regex is a tool to find data of a certain format. Why do you use it to split the input text, when you could use it to find the data you're looking for?
What is it you're looking for? A sequence of seven numbers, separated by commas. Translated into regex:
(?:\d+,){7}
However, we want to group the first 5 numbers - the "mains" - and the last 2 numbers - the "stars". So we'll add two named capture groups, named "mains" and "stars":
(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})
This pattern will find all numbers you're looking for.
import re
data= open("infile.txt",'r').read()
mains= []
stars= []
pattern= r'(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})'
iterator= re.finditer(pattern, data)
for count in range(int(input('Enter number of draws to examine: '))):
try:
match= next(iterator)
except StopIteration:
print('no more matches')
break
mains.append(match.group('mains'))
stars.append(match.group('stars'))
print(mains,stars)
This will print something like ['01,03,31,42,46,'] ['04,11,']. You may want to remove the commas and convert the numbers to ints, but in essence, this is how you would use regex.
I'm trying to find any occurrences of a character repeating more than 2 times in a user entered string. I have this, but it doesn't go into the if statement.
password = asDFwe23df333
s = re.compile('((\w)\2{2,})')
m = s.search(password)
if m:
print ("Password cannot contain 3 or more of the same characters in a row\n")
sys.exit(0)
You need to prefix your regex with the letter 'r', like so:
s = re.compile(r'((\w)\2{2,})')
If you don't do that, then you'll have to double up on all your backslashes since Python normally treats backlashes like an escape character in its normal strings. Since that makes regexes even harder to read then they normally are, most regexes in Python include that prefix.
Also, in your included code your password isn't in quotes, but I'm assuming it has quotes in your code.
Can't you simply go through the whole string and everytime you found a character equal to the previous, you incremented a counter, till it reached the value of 3? If the character was different from the previous, it would only be a matter of setting the counter back to 0.
EDIT:
Or, you can use:
s = 'aaabbb'
re.findall(r'((\w)\2{2,})', s)
And check if the list returned by the second line has any elements.