Using sed with a newline in a regex - regex

I'm bashing my head against the wall with this one. How do I do a regex replacement with sed on text that contains a newline?
I need to replace the value of the "version" XML element shown below. There are multiple version elements so I want to replace the one that comes after the "name" element.
<name>MyName</name>
<version>old</version>
Here's my command:
sed -i -E "s#(\s*<name>$NAME</name>\n\s*<version>)$VERSION_OLD(</version>)#\1$VERSION_NEW\2#g" $myfile.txt
Now as far as I know there is a way to make sed work with a newline character, but I can't figure it out. I've already used sed in my script so ideally I'd prefer to re-use it instead of say perl.

When you see your name element, you will need to use the N command to read the next line:
file:
<bar>MyName</bar>
<version>old</version>
<name>MyName</name>
<version>old</version>
<foo>MyName</foo>
<version>old</version>
With GNU sed:
sed '/<name>/{N;s/old/newer/}' file
Output:
<bar>MyName</bar>
<version>old</version>
<name>MyName</name>
<version>new</version>
<foo>MyName</foo>
<version>old</version>

If you're using GNU sed, you can use its extended addressing syntax:
sed '/<name>/,+1{/<version>/s/old/newer/}' file
Breaking this down, it says: for a line matching <name> and the following line (+1), then if the line matches <version>, substitute old with newer.
I'm assuming here that your file is generated, and will always have the name and version elements each on a single line, and adjacent. If you need to handle more free-form XML, then you should really consider an XPath-based tool rather than sed.

Related

regex command line with single-line flag

I would need to use regex in a bash script to substitute text in a file that might be on multiple lines.
I would pass s as flag in other regex engines that I know but I have a hard time for bash.
sed as far as I know doesn't support this feature.
perl it obviously does but I can not make it work in a one liner
perl -i -pe 's/<match.+match>//s $file
example text:
DONT_MATCH
<match some text here
and here
match>
DONT_MATCH
By default, . doesn't match a line feed. s simply makes . matches any character.
You are reading the file a line at a time, so you can't possibly match something that spans multiple lines. Use -0777 to treat the entire input as a one line.
perl -i -0777pe's/<match.+match>//s' "$file"
This might work for you (GNU sed):
sed '/^<match/{:a;/match>$/!{N;ba};s/.*//}' file
Gather up a collection of lines from one beginning <match to one ending match> and replace them by nothing.
N.B. This will act on all such collections throughout the file and the end-of-file condition will not effect the outcome. To only act on the first, use:
sed '/^<match/{:a;/match>$/!{N;ba};s/.*//;:b;n;bb}' file
To only act on the second such collection use:
sed -E '/^<match/{:a;/match>$/!{N;ba};x;s/^/x/;/^(x{2})$/{x;s/.*//;x};x}' file
The regex /^(x{2})$/ can be tailored to do more intricate matching e.g. /^(x|x{3,6})$/ would match the first and third to sixth collections.
With GNU sed:
$ sed -z 's/<match.*match>//g' file
DONT_MATCH
DONT_MATCH
With any sed:
$ sed 'H;1h;$!d;x; s/<match.*match>//g' file
DONT_MATCH
DONT_MATCH
Both the above approaches read the whole file into memory. If you have a big file (e.g. gigabytes), you might want a different approach.
Details
With GNU sed, the -z option reads in files with NUL as the record separator. For text files, which never contain NUL, this has the effect of reading the whole file in.
For ordinary sed, the whole file can be read in with the following steps:
H - Append current line to hold space
1h - If this is the first line, overwrite the hold space
with it
$!d - If this is not the last line, delete pattern space
and jump to the next line.
x - Exchange hold and pattern space to put whole file in
pattern space

Find and replace part of a string using bash in an XML file

I am new to bash scripting and was looking into what kid of command will help me replace a specific string in an xml file.
The string looks like
uri='file:/var/lib/abc/cde.repo/r/c/e/v/1.1/abc-1.1.jar'
Should be replaced with
uri='file:/lib/abc-1.1.jar'
The strings vary as the jars vary too. First part of the string "file:/var/lib/abc/cde.repo/r/" is constant and is across all strings. The 2nd half is varying
This needs to be done across entire file. Please note that replacing one is easier then doing it for each an every string that varies. I am trying to look for solution to do it in one single command.
I know we can use sed but how? Any pointers are highly appreciated
Thanks
With sed:
sed "s~uri='file:/var/lib/abc/cde.repo/r/c/e/v/1\.1/abc-1.1.jar~uri='file:/lib/abc-1\.1\.jar'~g"
Basically it is:
sed "s~pattern~replacement~g"
where s is the command substitute and the trailing g means globally. I'm using ~ as the delimiter char as this helps to avoid escaping all that / in the paths. (thanks #Jotne)
Update: In order to make the regex more flexible, you may try this:
sed 's~file.*/\(.*\.jar\)\(.*\)~file:///lib/\1\2~' a.txt
It searches for file: ... .jar links, grabs the name of the jar file and builds the new links.
Using awk you can do:
awk -F/ '/file:\/var\/lib\/abc\/cde.repo\/r/ {print $1,$3,$NF}' OFS=/ file
uri='file:/lib/abc-1.1.jar'
Static URL, but changing file name.
You do not need to use sed or even awk. You could simply use basename:
prefix='file:/lib/'
uri='file:/var/lib/abc/cde.repo/r/c/e/v/1.1/abc-1.1.jar'
result="${prefix}$(basename ${uri})"
echo ${result}
This worked for me :
sudo sed -i -e "s/stringToChange/TheNewString/g" test.xml

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Why does sed /^$/d delete only blank lines but /^$/p print all lines?

I'm able to use sed /^$/d <file> to delete all the blank lines in the file, but what if I want to print all the blank lines only? The command sed /^$/p <file> prints all the lines in file.
The reason I want to do this is that we use an EDA program (Expedition) that uses regex to run rules on the names of nets. I'm trying to find a way to search for all nets that don't have names assigned. I thought using ^$ would work, but it just ends up finding all nets, which is what /^$/p is doing too. So is there a different way to do this?
Unless otherwise specified sed will print the pattern space when it has finished processing it. If you look carefully at your output you'll notice that you get 2 blank lines for every one in the file. You'll have to use the -n command line switch to stop sed from printing.
sed -n /^$/p infile
Should work as you want.
You can also use grep as:
grep '^$' infile
Sed prints every line by default, and so the p flag is useless. To make it useful, you need to give sed the -n switch. Indeed, the following appears to do what you want:
sed -n /^$/p
think in another way, don't p, but !d
you may try:
sed '/^$/!d' yourFile

Suppress the match itself in grep

Suppose I'have lots of files in the form of
First Line Name
Second Line Surname Adress
Third Line etc
etc
Now I'm using grep to match the first line. But I'm doing this actually to find the second line. The second line is not a pattern that can be matched (it's just depend on the first line). My regex pattern works and the command I'm using is
grep -rHIin pattern . -A 1 -m 1
Now the -A option print the line after a match. The -m option stops after 1 match( because there are other line that matches my pattern, but I'm interested just for the first match, anyway...)
This actually works but the output is like that:
./example_file:1: First Line Name
./example_file-2- Second Line Surname Adress
I've read the manual but couldn't fidn any clue or info about that. Now here is the question.
How can I suppress the match itself ? The output should be in the form of:
./example_file-2- Second Line Surname Adress
sed to the rescue:
sed -n '2,${p;n;}'
The particular sed command here starts with line 2 of its input and prints every other line. Pipe the output of grep into that and you'll only get the even-numbered lines out of the grep output.
An explanation of the sed command itself:
2,$ - the range of lines from line 2 to the last line of the file
{p;n;} - print the current line, then ignore the next line (this then gets repeated)
(In this special case of all even lines, an alternative way of writing this would be sed -n 'n;p;' since we don't actually need to special-case any leading lines. If you wanted to skip the first 5 lines of the file, this wouldn't be possible, you'd have to use the 6,$ syntax.)
You can use sed to print the line after each match:
sed -n '/<pattern>/{n;p}' <file>
To get recursion and the file names, you will need something like:
find . -type f -exec sed -n '/<pattern>/{n;s/^/{}:/;p}' \;
If you have already read a book on grep, you could also read a manual on awk, another common Unix tool.
In awk, your task will be solved with a nice simple code. (As for me, I always have to refresh my knowledge of awk's syntax by going to the manual (info awk) when I want to use it.)
Or, you could come up with a solution combining find (to iterate over your files) and grep (to select the lines) and head/tail (to discard for each individual file the lines you don't want). The complication with find is to be able to work with each file individually, discarding a line per file.
You could pipe results though grep -v pattern