Replace strings with double quotes in a XML file

Replace strings with double quotes in a XML file - regex

I have a huge XML file with longer lines (5000-10000 characters per line) with following text:
Pattern="abc"
and I want to replace it with
Pattern="def"
As the line sizes are huge, I have no choice but to use awk. Please suggest how this can be achieved. I tried with the below but it is not working:
CMD="{sub(\"Pattern=\"abc\"\",\"Pattern=\"def\"\"); print}"
echo "$CMD"
awk "$CMD" "Some File Name.xml"
Any help is highly appreciated.

one suggestion with awk
BEGIN {FS="\""; OFS=""}
/Pattern="abc"/{$2="\"def\""}1

I don't understand why you said "As the line sizes are huge, I have no choice but to use awk". AFAIK sed is no more limited on line length than awk is and since this is a simple substitution on a single line, sed is the better choice of tool:
$ cat file
Pattern="abc"
$ sed -r 's/(Pattern=")[^"]+/\1def/' file
Pattern="def"
If the pattern occurs multiple times on the line, add a "g" to the end of the line.
Since you mention in your comment being stuck with a sed that can't handle long lines, let's assume you can't install GNU tools so you'll need a non-GNU awk solution like this:
$ awk '{sub(/Pattern="[^"]+/,"Pattern=\"def")}1' file
Pattern="def"
If you LITERALLY mean you only want to replace Pattern="abc" then just do:
$ awk '{sub(/Pattern="abc"/,"Pattern=\"def\"")}1' file
Pattern="def"

If You have bash you can try this:
Create file with long lines (>10_000 chars):
for((i=0;i<2500;++i));{ s="x$s";}
l="${s}Pattern=\"abc\"$s"
for i in {1..5}; { echo "$l$l";} >infile
The script:
while read x; do echo "${x//Pattern=\"abc\"/Pattern=\"def\"}";done <infile
This replaces all occurrences of Pattern="abc" to Pattern="def" in each line.

Related

Get specific Text between Specific Tags

At the top of my HTML files, I have...
<H2>City</H2>
<P>Liverpool</P>
or
<H2>City</H2>
<P>Dublin</P>
I want to output the text between the tags straight after <H2>City</H2> instances. So in the examples above which are separate files, I want to print out Liverpool and in the second example, Dublin.
Looking at this thread, I try:
sed -e 's/City\(.*\)\/P/\1/'
which I hope would get me half way there... but that just prints out the entire file. Any ideas?

awk to the rescue! You need multi-char RS support though (gawk has it)
$ awk -F'[<>]' -v RS='<H2>City</H2>' 'NF{print $3}' file
another approach can be
$ awk 'c&&c--{sub(/<[^>]*>/,""); print} /<H2>City<\/H2>/{c=1}' file
find the next record after City and trim the angle brackets...

Try using the following regex :
(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)
see regex demo / explanation
sed
sed -e 's/(?s)(?<=City<\/H2>\n<P>).*?(?=<\/P>)/'

I checked and the \s seem not work for spaces. You should use the newline character \n:
sed -e 's/<H2>City<\/H2>\n<P>\(.*\)<\/P>/\1/'
There is no need of use lookbehind (like above), that is an overkill.

With sed, you can use the n command to read next line after your pattern. Then just remove the tag to output your content:
sed -n '/<H2>City<\/H2>/n;s/ *<\/*P> *//gp;' file

I think this should work in your mac:
echo -e "<H2>City</H2>\n<P>Dublin</P>" |awk -F"[<>]" '/City/{getline;print $3}'
Dublin

Deleting lines matching a pattern from a Unix file

I have a file containing strings of the following format:
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|DELETE|REDEFINES|VARIABLE.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
I want to be able to use something like sed or awk to delete lines containing the word REDEFINES but NOT if the word PIC is also in there or if there is no full stop at the end of a line as this means the string has been split over 2 lines. So out of the 4 lines (3 strings) stated above I would only want to delete 05|DELETE|REDEFINES|VARIABLE.
I thought you might be able to use some kind of negation or lookahead but these don't seem to be available or I can't get them to work
Using awk this deletes anything containing REDEFINES in the String following the pattern in the example above:
awk '!/[[:print:]]*\REDEFINES[[:print:]]*\./'
Similarly using sed:
sed '/[[:print:]]*|REDEFINES[[:print:]]*\./d'
I just can't work out how to extend it to do what I need. Is this possible in sed or awk or do I need another tool?
Any help greatly appreciated.

Using awk
awk -v RS= '!/REDEFINES/ || /PIC/' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
05|KEEP2|REDEFINES|VARIABLE2
|PIC|9(5).
Using sed (with older input data):
sed -i.bak '/REDEFINES/{/PIC/!d;}' file
05|KEEP|REDEFINES|NO_TYPE|PIC|9.

You can try the below command. Print the line if it contains PIC or if it does not contain REDEFINES. It is maintainable as it is not so tricky and could be understood without much of an effort.
cat input.txt | awk '{if ($0 ~ /PIC/ || $0 !~ /REDEFINES/){print $0}}'

Why don't you just use grep? Using negations on your question, here is what I understood:
keep the lines terminated with a full-stop, containing both REDEFINES and PIC.
So grep seems easy:
$ grep -E 'REDEFINES.*\.$' file | grep PIC
05|KEEP|REDEFINES|NO_TYPE|PIC|9.
Hope this helps.

This might work for you (GNU sed):
sed -r '/REDEFINES/{/PIC|[^.]$/!d}' file
or perhaps more easily:
sed '/PIC/b;/REDEFINES.*\.$/d' file
or if you prefer:
sed '/PIC/!{/REDEFINES.*\.$/d}' file

how do i replace the first 100 characters of all lines in a file using awk

How do I replace the first 100 characters of all lines in a file using awk? There is no field delimiter in this file. All fields are fixed width. And given the variation in the data, I cannot use a search and replace.

How about sed? To replace the first 100 characters with say A:
$ sed -r 's/.{100}/A/' file
If you're happy with the results rewrite the file using -i:
$ sed -ri 's/.{100}/A/' file

awk '{print "replacing text..." substr($0,100)}'

Use pure shell.
#!/usr/bin/env bash
# read each line into shell variable REPLY
while read -r ; do
echo "REPLACE text ... ${REPLY:100}"
done <file
Explanation
REPLY is shell variable, refer http://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html. Set to the line of input read by the read builtin command when no arguments are supplied
${REPLY:100} - get the string after 100 characters.

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.

sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file

delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'

You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename

You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.

This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?

Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).

I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.

This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile

Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.

I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file

paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file

This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.

I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace strings with double quotes in a XML file - regex

one suggestion with awk BEGIN {FS="\""; OFS=""} /Pattern="abc"/{$2="\"def\""}1

Related

Get specific Text between Specific Tags

Deleting lines matching a pattern from a Unix file

how do i replace the first 100 characters of all lines in a file using awk

how to select lines containing several words using sed?

Bash - how to put each line within quotation

Categories

Resources