This question already has answers here:
replacing text in a file with Python
(7 answers)
Closed 8 years ago.
I know that many questions exist here on finding and replacing text in file using python 2. However being a very new to python, I did not understand the syntax and may be the purpose also will be different.
I am looking for something very simple code lines as in linux shellscript
sed -i 's/find/replace/' *.txt
sed -i 's/find2/replace2/' *.txt
Can this code work to replace a multiline text
with open('file.txt', 'w') as out_file:
out_file.write(replace_all('old text 1', 'new text 1'))
out_file.write(replace_all('old text 2', 'new text 2'))
Also, there seems a problem with getting another newline, which I do not want. Any ideas or help?
So, with Python, the easiest thing to do is read all the text from the file into a string. Then perform any necessary replacements using that string. Then write the entire thing back out to the same file:
filename = 'test.txt'
with open(filename, 'r') as f:
text = f.read()
text = text.replace('Hello', 'Goodbye')
text = text.replace('name', 'nom')
with open(filename, 'w') as f:
f.write(text)
The replace method works on any string and replaces any (case-sensitive) match of the first argument with the second. You're reading and writing to the same file, just in two different steps.
Here is a quick sample. If you want more powerful search/replace you can use regex instead of string.replace
import fileinput
for line in fileinput.input(inplace=True):
newline = line.replace('old text','new text').strip()
print newline
Put the above code in a desired file, say sample.py, and assuming your python is in your path you can run as:
python sample.py inputfile
That will replace 'old text' with 'new text' in inputfile. Ofcourse you can pass multiple files as arguments as well. See https://docs.python.org/2/library/fileinput.html
Related
This question already has answers here:
Escape a string for a sed replace pattern
(17 answers)
Closed 7 years ago.
I am using sed for string replacement in a config file.
User has to input the string salt and then I replace this salt string in the config file:
Sample config file myconfig.conf
CONFIG_SALT_VALUE=SOME_DUMMY_VALUE
I use the command to replace dummy value with value of salt entered by the user.
sed -i s/^CONFIG_SALT_VALUE.*/CONFIG_SALT_VALUE=$salt/g" ./myconfig.conf
Issue : value of $salt can contain any character, so if $salt contains / (like 12d/dfs) then my above sed command breaks.
I can change delimiter to !, but now again if $salt contains amgh!fhf then my sed command will break.
How should I proceed to this problem?
You can use almost any character as sed delimiter. However, as you mention in your question, to keep changing it is fragile.
Maybe it is useful to use awk instead, doing a little bit of parsing of the line:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""; FS=OFS="="}
$1 == "CONFIG_SALT_VALUE" {$2=repl}
1' "$salt" file
As one liner:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""; FS=OFS="="} $1 == "CONFIG_SALT_VALUE" {$2=repl}1' "$salt" file
This sets = as field separator. Then, it checks when a line contains CONFIG_SALT_VALUE as parameter name. When this happens, it replaces the value to the one given.
To prevent values in $salt like foo\\bar from being interpreted, as that other guy commented in my original answer, we have the trick:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""} ...' "$var" file
This uses the answer in How to use variable including special symbol in awk? where Ed Morton says that
The way to pass a shell variable to awk without backslashes being
interpreted is to pass it in the arg list instead of populating an awk
variable outside of the script.
and then
You need to set ARGV[1]="" after populating the awk variable to
avoid the shell variable value also being treated as a file name.
Unlike any other way of passing in a variable, ALL characters used in
a variable this way are treated literally with no "special" meaning.
This does not do in-place editing, but you can redirect to another file and then replace the original:
awk '...' file > tmp_file && mv tmp_file file
I want to extract (parse) a text file which has particular word, for my requirement whatever the rows which have the words "cluster" and "week" and "8.2" it should be written to the output file.
sample text in the file
2013032308470272~800000102507~Cluster-Mode~WEEK~8.1.2~V6240
2013032308470272~800000102507~Cluster-Mode~monthly~8.1.2~V6240
2013032308470272~800000102507~Cluster-Mode~WEEK~8.2.2~V6240
2013032308470272~800000102507~Cluster-Mode~yearly~8.1.2~V6240
Desired output into another text file by above mentioned filters
2013032308470272~800000102507~Cluster-Mode~WEEK~8.2.2~V6240
I have writen a code using the awk command, however the output file contains the rows which are out of the scope of the filters.
code used to extract the text
awk '/Cluster/ && /WEEK/ && /8.2/ { print $NF > "/u/nbsvc/Data/Lookup/derived_asup_2010404_201409_2.txt" }' /u/nbsvc/Data/Lookup/cmode_asup_lookup.txt
obtained output
2013032308470272~800000102507~Cluster-Mode~WEEK~8.1.2~V6240
2013032308470272~800000102507~Cluster-Mode~WEEK~8.2.2~V6240
Note: the first line of obtained output is not needed in the desired output. How can I change my script to only get the line that I want?
To remove any ambiguity and false matches on partial fields or the wrong field, THIS is the command you need to run:
$ awk -F'~' '$3~/^Cluster/ && $4=="WEEK" && $5~/^8\.2/' file
2013032308470272~800000102507~Cluster-Mode~WEEK~8.2.2~V6240
I don't think that awk is needed at all here. Just use grep to match the line that you're interested in:
grep 'Cluster.*WEEK.*8\.2' file > output_file
The .* matches zero or more of any character and > is used to redirect the output to a new file. I have escaped the . in between "8.2" so that it is interpreted literally, rather than matching any character (although it would work either way).
there is actually little more in my requirement, it is I need to read this text file, then I need to split the line (where the cursor is) and push the values into a array and then I need to check for the values does it match with my pattern or not, if it matches then I need to write it to a out put text file, else simply ignore it, this one I did like as below..
cat /inputfolder_path/lookup_filename.txt | awk '{IGNORECASE = 1;line=$0;split(line,a, "~") ;if (a[1] ~ /201404/ && a[3]~/Cluster/ && a[4]~/WEEK/ && a[5]~/8.2/){print $0}}' > /outputfolder_path/derived_output_filename.txt
this is working exactly for my requirement..
Just thought to update this to every one, as it may help someone..
Thanks,
Siva
working on a large text file and I'd like to remove all lines that don't contain the text "event":"click"}]
I've tried to do some regex within Sublime 3 and can't get it to stick.
I have not used sublime but you could select all line not containing the text "event":"click"}] with the regex:
^(.(?!"event":"click"\}\]))*$
I think you could replace them by nothing(empty string) or backspace
Use this one to get result to stdout
sed -n '/"event":"click"\}\]$/p' your_large_file
Use this one to keep only lines that end with "event":"click"}], your_large_file.old backup will be generated
sed -i.old -n '/"event":"click"\}\]$/p' your_large_file
In file a.txt containing "abc", I want to replace "abc" with a 20000-length string. I could use Stata command filefilter for converting ASCII text or binary patterns in a file to do it. How to do it in SAS?
a.txt contents:
{\rtf1
{\fonttbl{\f1\fmodern\fcharset134;}}
{\info}
\sectd\pgwsxn11907\pghsxn16840\marglsxn1418\margrsxn1418
\margtsxn1440\margbsxn1440\sectdefaultcl
\headery851{\header\pard\qr\fs18\par}
\footery992{\footer\pard\qc\f0\fs18\chpgn\par}
{\pard\qc\sb30\sa30\fs21 \par
\trowd\trautofit1\trgaph0\trleft-75\intbl\trqc
\clbrdrt\brdrs\brdrw30\clbrdrb\brdrs\brdrw10\clvertalc\cellx6993\clbrdrt
\brdrs\brdrw30\clbrdrb\brdrs\brdrw10\clvertalc\cellx13986\clbrdrt\brdrs\brdrw30
\clbrdrb\brdrs\brdrw10\clvertalc\cellx20979
\qc\fs21 x\cell\qc\fs21 y\cell\qc\fs21 z\cell\row
\trowd\trautofit1\trgaph0\trleft-75\trqc
\clvertalc\cellx6993\clvertalc\cellx13986
\clvertalc\cellx20979
\qc\fs21 a\cell\qc\fs21 b\cell\qc\fs21 abc\cell\row
\trowd\trautofit1\trgaph0\trleft-75\intbl\trqc
\clbrdrb\brdrs\brdrw30\clvertalc\cellx6993\clbrdrb\brdrs\brdrw30
\clvertalc\cellx13986\clbrdrb\brdrs\brdrw30\clvertalc\cellx20979
\qc\fs21 d\cell\qc\fs21 e\cell\qc\fs21 f\cell\row
}}
There are suggestions for some tools here that will work on Windows.
One you have a tool like FART or sed installed, you can use x as prefix in SAS to access the command line like this.
I have a huge file that I need to filter out all lines (comma delimited file) that do not contain an email address (determining that by # character).
Right now what I have is this to find all lines containing the # sign:
.*,.*,.*#.*,.*$
basically you have 4 values and the 3rd value has the email address.
the replace with: value would be empty.
You have about 10 different ways to do this in TextMate and even more from the command line. Here are some of the easier ways...
From TextMate:
Command-control-t, start typing some part of the command "Copy Non-Matching Lines into New Document", use # (nothing else) for the pattern.
Same as above, except the command you're looking for is "Distill Document / Selection"
Find and select an # symbol. Then do the same as the above but search for the command "Strip Lines Matching Selection/Clipboard". You may not have it as I may have developed this one myself.
From the command line:
Type one of the following commands, replacing FILE with the filename, including the filepath if it's not in your current working directory. The filtered content can be found in FILE-new.
Using egrep: egrep -v '#' FILE > FILE-new
Using sed: cat FILE | sed -e "/#/D" > FILE-new
For both of the above, use diff to see what you accomplished: diff FILE{,-new}
That should probably do, I'm guessing...
try replace ^[^#]*$ with nothing. Alternatively, grep the file with your regex and redirect the result into a new file.