I am trying to write a simple script that a user can enter what he/she wants to search in a specified txt file. If the word they searching is found then print it to a new text file. This is what I got so far.
import re
import os
os.chdir("C:\Python 2016 Training")
patterns = open("rtr.txt", "r")
what_directory_am_i_in = os.getcwd()
print what_directory_am_i_in
search = raw_input("What you looking for? ")
for line in patterns:
re.findall("(.*)search(.*)", line)
fo = open("test", "wb")
fo.write(line)
fo.close
This successfully creates a file called test, but the output is nothing close to what word was entered into the search variable.
Any advice appreciated.
First of all, you have not read a file
patterns = open("rtr.txt", "r")
this is a file object and not the content of file, to read the file contents you need to use
patterns.readlines()
secondly, re.findall returns a list of matched strings, so you would want to store that. You regex is also not correct as pointed by Hani, It should be
matched = re.findall("(.*)" + search + "(.*)", line)
rather it should be :
if you want the complete line
matched = re.findall(".*" + search + ".*", line)
or simply
matched = line if search in line else None
Thirdly, you don't need to keep opening your output file in the for loop. You are overwriting your file everytime in the loop so it will capture only the last result. Also remember to call the close method on the files.
Hope this helps
you are searching here for all lines that has "search" word in it
you need to get the lines that has the text you entered in the shell
so change this line
re.findall("(.*)search(.*)", line)
to
re.findall("(.*)"+search+"(.*)", line)
Related
import re
output = open("teste-out.txt","w")
input = open("teste.txt")
for line in input:
output.write(re.sub(r"\n\r03110", r"|03110", line))
input.close()
output.close()
Why this code isn´t working, anyone can help me fix it? I wanna read from a txt and if the line starts with 03110 I wanna merge only this line with the previous line and add | before the merge
I´ve tried \n03110 \r03110 and other options, but none is working. In notepad++ I can do this using \R++03110 and replace with |03110 using regular expressions, but I wanna a python solution to optimize the job.
Input
01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00
03110|||6073,00|0|0|0,00|8
99999|23
Output
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00|03110|||6073,00|0|0|0,00|8
99999|23
I´m using python at windows.
2nd EDIT: sorry - I guess I didn't read carefully enough...
Well, to merge lines with regards to the beginning of the second line is also possible, but perhaps not as beautifully clean:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
fout.write(next(fin)[:-1])
for line in fin:
if line.startswith('03110'):
fout.write(f'|{line[:-1]}')
else:
fout.write(f'\n{line[:-1]}')
fout.write('\n')
EDIT: solution working with files:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
for line in fin:
if line.startswith('03100'):
fout.write(line[:-1] + '|' + next(fin))
else:
fout.write(line)
Just for the case of interest - this is no re job imho:
s_in = '''01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489'''
from io import StringIO
with StringIO(s_in) as fin:
for line in fin:
if line.startswith('03100'):
print(line[:-1] + '|' + next(fin), end='')
else:
print(line, end='')
results in requested
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
For those who like sed, this is a very short solution (not that efficient, though, as it reads all lines before printing anything):
< input_file sed '$!N;s/\n03110/03110/g'
The following sed script is a more efficient solution:
#!/usr/bin/sed -f
:h
N
s/\n03110/|03110/
t h
h
s/\n.*//
p
g
D
For the casual reader who really likes sed like I do, here's a short explanation:
the 4 lines from :h to t h are essentially a "do-while" loop in which we append a new line to the pattern space (N), and we keep doing so (t h is a "goto"), as long as the substitution command (s) is successful in changing the embedded newline \n to a |;
as soon as the s command is unsuccessful, we "save" the multiline pattern space copying it into the hold space (h), safely delete the \n and whatever is after it (s/\n.*//), and finally print the what remains (p), which is the lines that we've been successfully joining;
it's now time to get back the last line we appended which did not start by 03110: we get (g) the multiline back from the hold space, delete \n together with whatever precedes it and go to the top without printing (D).
we are back to the top of the script with a line which is not printed yet, just like we started.
I am supposed to make a code which will read a text file containing some words with some common linguistic features. Apply some regular expression to all of the words and write one file which will have the changed words.
For now let's say my text file named abcd.txt has these words
king
sing
ping
cling
booked
looked
cooked
packed
My first question starts from here. In my simple text file how to write these words to get the above mentioned results. Shall I write them line-separated or comma separated?
This is the code provided by user palvarez.
import re
with open("new_abcd", "w+") as new, open("abcd") as original:
for word in original:
new_word = re.sub("ing$", "xyz", word)
new.write(new_word)
Can I add something like -
with open("new_abcd", "w+") as file, open("abcd") as original:
for word in original:
new_aword = re.sub("ed$", "abcd", word)
new.write(new_aword)
in the same code file? I want something like -
kabc
sabc
pabc
clabc
bookxyz
lookxyz
cookxyz
packxyz
PS - I don't know whether mentioning this is necessary or not, but I am supposed to do this for a Unicode supported script Devanagari. I didn't use it here in my examples because many of us here can't read the script. Additionally that script uses some diacritics. eg. 'का' has one consonant character 'क' and one vowel symbol 'ा' which together make 'का'. In my regular expression I need to condition the diacritics.
I think the approach you have with one word by line is better since you don't have to trouble yourself with delimiters and striping.
With a file like this:
king
sing
ping
cling
booked
looked
cooked
packed
And a code like this, using re.sub to replace a pattern:
import re
with open("new_abcd.txt", "w") as new, open("abcd.txt") as original:
for word in original:
new_word = re.sub("ing$", "xyz", word)
new_word = re.sub("ed$", "abcd", new_word)
new.write(new_word)
It creates a resulting file:
kxyz
sxyz
pxyz
clxyz
bookabcd
lookabcd
cookabcd
packabcd
I tried out with the diacritic you gave us and it seems to work fine:
print(re.sub("ा$", "ing", "का"))
>>> कing
EDIT: added multiple replacement. You can have your replacements into a list and iterate over it to do re.sub as follows.
import re
# List where first is pattern and second is replacement string
replacements = [("ing$", "xyz"), ("ed$", "abcd")]
with open("new_abcd.txt", "w") as new, open("abcd.txt") as original:
for word in original:
new_word = word
for pattern, replacement in replacements:
new_word = re.sub(pattern, replacement, word)
if new_word != word:
break
new.write(new_word)
This limits one modification per word, only the first that modifies the word is taken.
It is recommended that for starters, utilize the with context manager to open your file, this way you do not need to explicitly close the file once you are done with it.
Another added advantage is then you are able to process the file line by line, this will be very useful if you are working with larger sets of data. Writing them in a single line or csv format will then all depend on the requirement of your output and how you would want to further process them.
As an example, to read from a file and say substitute a substring, you can use re.sub.
import re
with open('abcd.txt', 'r') as f:
for line in f:
#do something here
print(re.sub("ing$",'ring',line.strip()))
>>
kring
sring
pring
clring
Another nifty trick is to manage both the input and output utilizing the same context manager like:
import re
with open('abcd.txt', 'r') as f, open('out_abcd.txt', 'w') as o:
for line in f:
#notice that we add '\n' to write each output to a newline
o.write(re.sub("ing$",'ring',line.strip())+'\n')
This create an output file with your new contents in a very memory efficient way.
If you'd like to write to a csv file or any other specific formats, I highly suggest you spend sometime to understand Python's input and output functions here. If linguistics in text is what you are going for that understand encoding of different languages and further study Python's regex operations.
I am hoping to receive some feedback on some code I have written in Python 3 - I am attempting to write a program that reads an input file which has page numbers in it. The page numbers are formatted as: "[13]" (this means you are on page 13). My code right now is:
pattern='\[\d\]'
for line in f:
if pattern in line:
re.sub('\[\d\]',' ')
re.compile(line)
output.write(line.replace('\[\d\]', ''))
I have also tried:
for line in f:
if pattern in line:
re.replace('\[\d\]','')
re.compile(line)
output_file.write(line)
When I run these programs, a blank file is created, rather than a file containing the original text minus the page numbers. Thank you in advance for any advice!
Your if statement won't work because not doing a regex match, it's looking for the literal string \[\d\] in line.
for line in f:
# determine if the pattern is found in the line
if re.match(r'\[\d\]', line):
subbed_line = re.sub(r'\[\d\]',' ')
output_file.writeline(subbed_line)
Additionally, you're using the re.compile() incorrectly. The purpose of it is to pre-compile your pattern into a function. This improves performance if you use the pattern a lot because you only evaluate the expression once, rather than re-evaluating each time you loop.
pattern = re.compile(r'\[\d\]')
if pattern.match(line):
# ...
Lastly, you're getting a blank file because you're using output_file.write() which writes a string as the entire file. Instead, you want to use output_file.writeline() to write lines to the file.
You don't write unmodified lines to your output.
Try something like this
if pattern in line:
#remove page number stuff
output_file.write(line) # note that it's not part of the if block above
That's why your output file is empty.
I have a .txt file full with lines like:
username:password:*email*:email_address, where *email* is a fixed word throughout the file.
I wanted to delete everything after the first : and before the second :, in other words, delete the passwords and have only username:*email*:email_address.
Can anyone help me? Thanks
You could write a Python script like this:
with open('file.txt') as f:
for line in f.readlines():
fields = line.split(':')
del fields[1]
print ':'.join(fields),
and redirect the output to a new file.
I'm trying to replace text in a .txt file using a .py script. Here's what i have so far:
docname=raw_input("Enter a document name: ")
fo=open(docname, 'r+')
string=fo.read()
replace=raw_input("Enter what you want to replace: ")
replacewith=raw_input("Enter what you want to replace it with: ")
out=string.replace(replace,replacewith)
fo.write(out);
fo.close()
print "Check the document!"
closeInput = raw_input("Press ENTER to exit")
I have a txt file called "test.txt" (in the same directory as the .py script). When I enter "test.txt", it asks what I want to replace, as expected. When I fill out that, it asks for what I want to replace it with.
I fill out that, and the program closes. No "Check the document!" or anything. And worst of all, it doesn't replace it with the second string.
Please help!
you have 2 possibilities:
if you want to open the file just once, you should reset the
position of the stream with fo.seek(0)
you can close and reopen the file with fo = open(docname, 'w')
The first option has one problem: if the replace-text is shorter than the original-text, some text will be left over at the end.
To illustrate what I'm talking of:
You have the text '12345' and want to replace '12' with 'a', then the resulting file would
contain 'a3455'