Linux SED RegEx replace, but keep wildcards - regex

If I have a string that contains this somewhere (Foo could be anything):
<tag>Foo</tag>
How would I, using SED and RegEx, replace it with this:
[tag]Foo[/tag]
My failed attempt:
echo "<tag>Foo</tag>" | sed "s/<tag>\(.*\)<\\/tag>/[tag]\1[\\/tag]"

Your regex is missing the terminating /
$ echo "<tag>Foo</tag>" | sed "s/<tag>\(.*\)<\\/tag>/[tag]\1[\\/tag]/"
[tag]Foo[/tag]

With this you can replace all types of tags and don't have to be tag specific.
$echo "<tag>Foo</tag>" | sed "s/[^<]*<\([^>]*\)>\([^<]*\)<\([^>]*\)>/[\1]\2[\3]/"
hope this helps.

Related

Why is sed not extracting value?

When I run my regex with sed
echo "abc-def-stg" | sed -e '/(\w*$)/g'
on regexr.com it works with no problems, but when I try to extract the value stg using said it does not work.
Can anyone explain why?
sed is used to replace strings. You are trying to extract.
Use (as John1024 said)
echo "abc-def-stg" | sed '/.*-//'
It will remove all up to and including the last hyphen. Or
echo "abc-def-stg" | grep -oE '[^-]+$'
It will extract all characters other than a hyphen at the end of the string.

Whats wrong with below regex

What is wrong with below regex in unix ?
echo AB345678 | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}|[0-9]\{8\}\)$/p'
echo 12345678 | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}|[0-9]\{8\}\)$/p'
i am not getting the output :(
I mean the string I echoed why is it not matching with my regex?
Whats wrong with my regex?
The alternation operator in the BRE regex syntax must be defined as an escaped pipe \| (similar to ( and )):
echo "AB345678" | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}\|[0-9]\{8\}\)$/p'
^^
See an online demo.
In a more complicated expression you can add '-r' to sed options instead of escaping sensitive characters.
From sed manual:
-r, --regexp-extended
use extended regular expressions in the script.
Answer:
echo AB345678 | sed -nr '/^([a-zA-Z]{2}[0-9]{6}|[0-9]{8})$/p'
^
echo 12345678 | sed -nr '/^([a-zA-Z]{2}[0-9]{6}|[0-9]{8})$/p'
^

Sed replace domain in URL

I have these strings http://sub.domain.com/myuri/default.aspx, https://sub.domain.com/myuri/default.aspx and https://domain.com
Is it possible to use sed to replace only the domain part?
For example, this URL:
http://sub.domain.com/myuri/default.aspx
Would become:
http://anotherdomain.com/myuri/default.aspx
Please note that the protocol may differ between https and http.
I did search but could not find something similar.
You will need non-greedy pattern that sed can't offer, use perl instead:
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g'
Edit:
awk also does the job well and it's even simpler actually:
awk -F/ 'gsub($3,"anotherdomain",$0)' <<< "$urls"
Example:
#!/bin/bash
urls=$(cat << 'EOF'
https://sub.domain.com/myuri/default.aspx
http://sub.domain.com/myuri/default.aspx
http://blabla
EOF
)
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g' <<< "$urls"
Output:
bash test.sh
https://anotherdomain/myuri/default.aspx
http://anotherdomain/myuri/default.aspx
http://anotherdomain
If I follow your question, then yes sed 's/sub\.domain\.com/anotherdomain\.com/1' -
echo "http://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
http://anotherdomain.com/myuri/default.aspx
And with,
echo "https://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
https://anotherdomain.com/myuri/default.aspx
You can use sed like this:
sed -r 's|(https?://)[^/]+([[^:blank:]]*)|\1anotherdomain.com\2|g' file
http://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com
PS: Use sed -E on OSX.
Based on #hek2mgl's solution:
SERVER=www.example.com
sed "s=\(https\?://\)[^/]\+=\1$SERVER=" \
<<< 'https://anotherdomain.com/myuri/default.aspx'
It will output:
https://www.example.com/myuri/default.aspx
Modifications from hek2mgl's sed line:
a little shorter (no need to catch the part after domain name to paste it as is in replacement)
deals with both http:// and https:// syntax
You can use sed:
SERVER=www.example.com
sed "s~https\?://\([^/]\+\)\(.*\)~http://$SERVER\2~" <<< "http://newsub.domain.com/myuri/default

Sed substitute input by first matching argument

I'm trying to get some sed command to work without success...
echo -e "This.Is.a.Test.V03.r501.dump" | sed "s/^\(\w+(\.\w+)*\)\.V[0-9]{2}.*$/\1/g"
Basically, I want to match and return This.Is.a.Test while this \.V[0-9]{2} is fixed, but instead it returns the whole input string.
Any help is appreciated, thanks in advance!
\w matches alphanumerics, you are looking to capture only alphabets, so replace \w with [:alpha:]. Additionally {2} needs to be replaced with \{2\}. The following works with GNU sed
echo -e "This.Is.a.Test.V03.r501.dump" |
sed "s/^\([[:alpha:].]\+\)\.V[0-9]\{2\}.*$/\1/g"
This.Is.a.Test
Try this.
echo -e "This.Is.a.Test.V03.r501.dump" | sed -e "s/\(.*\)\.V[0-9]*.*/\1/"
Another way with sed
sed -r 's/^(([^.]+.){3})([^.]+).*/\1\3/'
Are you looking for this?
One way is to use awk
$ echo "This.Is.a.Test.V03.r501.dump" | awk -F'.' 'BEGIN{OFS=FS}{NF=4}1'
This.Is.a.Test

Using sed and regex to capture last part of url

I'm trying to make sed match the last part of a url and output just that. For example:
echo "http://randomurl/suburl/file.mp3" | sed (expression)
should give the output:
file.mp3
So far I've tried sed 's|\([^/]+mp3\)$|\1|g' but it just outputs the whole url. Maybe there's something I'm not seeing here but anyways, help would be much appreciated!
this works:
echo "http://randomurl/suburl/file.mp3" | sed 's#.*/##'
basename is your good friend.
> basename "http://randomurl/suburl/file.mp3"
=> file.mp3
This should do the job:
$ echo "http://randomurl/suburl/file.mp3" | sed -r 's|.*/(.*)$|\1|'
file.mp3
where:
| has been used instead of / to separate the arguments of the s command.
Everything is matched and replaced with whatever if found after the last /.
Edit: You could also use bash parameter substitution capabilities:
$ url="http://randomurl/suburl/file.mp3"
$ echo ${url##*/}
file.mp3
echo 'http://randomurl/suburl/file.mp3' | grep -oP '[^/\n]+$'
Here's another solution using grep.