Sed replace domain in URL - regex

I have these strings http://sub.domain.com/myuri/default.aspx, https://sub.domain.com/myuri/default.aspx and https://domain.com
Is it possible to use sed to replace only the domain part?
For example, this URL:
http://sub.domain.com/myuri/default.aspx
Would become:
http://anotherdomain.com/myuri/default.aspx
Please note that the protocol may differ between https and http.
I did search but could not find something similar.

You will need non-greedy pattern that sed can't offer, use perl instead:
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g'
Edit:
awk also does the job well and it's even simpler actually:
awk -F/ 'gsub($3,"anotherdomain",$0)' <<< "$urls"
Example:
#!/bin/bash
urls=$(cat << 'EOF'
https://sub.domain.com/myuri/default.aspx
http://sub.domain.com/myuri/default.aspx
http://blabla
EOF
)
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g' <<< "$urls"
Output:
bash test.sh
https://anotherdomain/myuri/default.aspx
http://anotherdomain/myuri/default.aspx
http://anotherdomain

If I follow your question, then yes sed 's/sub\.domain\.com/anotherdomain\.com/1' -
echo "http://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
http://anotherdomain.com/myuri/default.aspx
And with,
echo "https://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
https://anotherdomain.com/myuri/default.aspx

You can use sed like this:
sed -r 's|(https?://)[^/]+([[^:blank:]]*)|\1anotherdomain.com\2|g' file
http://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com
PS: Use sed -E on OSX.

Based on #hek2mgl's solution:
SERVER=www.example.com
sed "s=\(https\?://\)[^/]\+=\1$SERVER=" \
<<< 'https://anotherdomain.com/myuri/default.aspx'
It will output:
https://www.example.com/myuri/default.aspx
Modifications from hek2mgl's sed line:
a little shorter (no need to catch the part after domain name to paste it as is in replacement)
deals with both http:// and https:// syntax

You can use sed:
SERVER=www.example.com
sed "s~https\?://\([^/]\+\)\(.*\)~http://$SERVER\2~" <<< "http://newsub.domain.com/myuri/default

Related

What characters do I need to escape with sed to make this regex work

(?<![0-9])0+(?=[0-9]+)
I need to remove unnecessary leading zeros in malformed octettes of IP addresses.
I want to do something like this but it is not working.
cat Qualys-Active-IPs.csv | awk -F';' {'print $1'} | sed 's/(?<![0-9])0+(?\=[0-9]+)//g'
The solution is:
sed -r 's/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)$/\1.\2.\3.\4/'
You may try this code:
sed -r 's/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)-0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+),...,(.*)$/\9:\1.\2.\3.\4-\5.\6.\7.\8/'

Swap columns in bash using SED without using loop

I'm new to Sed, I'm trying to learn some pattern using Sed.
I got a filenamne.txt that has the following entry:
ppp/jjj qqq/kkk rrr/lll
My goal is to swap the word before the slash and the word after the slash in each of the three word1/word2 columns:
jjj/ppp kkk/qqq lll/rrr
I tried using sed –re ‘s!(.*)(/)(.*)!\1\2\!’ filename.txt, but it didn't work. Any idea how can I go about it?
$ echo "ppp/jjj qqq/kkk rrr/lll" | sed -e 's/$/ /' -e 's!\([^/]*\)/\([^ ]*\) !\2/\1 !g'
jjj/ppp kkk/qqq lll/rrr
Use replacement in perl command-line is a lot more straight-forward :-
perl -pe 's/(\w+)\/(\w+)/$2\/$1/g' file
jjj/ppp kkk/qqq lll/rrr
$ sed 's#\([^ ]*\)/\([^ ]*\)#\2/\1#g' file
jjj/ppp kkk/qqq lll/rrr

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Rewrite URL using sed while maintaining filename

I would like to find all instances of a URL in a file and replace them with a different link structure.
An example would be convert http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png to /images/Security_Panda.png.
I am able to identify the link using a regular expression such as:
^(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)
but need to rewrite using sed so that the file name is maintained. I understand that I will need to use s/${PATTERN}/${REPLACEMENT}/g.
Tried: sed -i 's#(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)#/dir/$1#g' test without success? Thoughts on how to improve the approach?
In basic sed, you need to escape the () symbols like \(..\) to mean a capturing group.
sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g' file
Example:
$ echo 'http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png' | sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g'
/images/Security_Panda.png
You can use:
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' file
/images/Security_Panda.png
Testing:
s='http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png'
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' <<< "$s"
/images/Security_Panda.png
Easier way if you change your idea.
#!/usr/bin/env bash
URL="http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png"
echo "/image/${URL##*/}"
Another way
command line
sed 's#^http:.*/\(.*\).$#/images/\1#g'
Example
echo "http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png "|sed 's#^http:.*/\(.*\).$#/images/\1#g'
results
/images/Security_Panda.png
An awk version:
awk -F\/ '/(jpg|gif|png) *$/ {print "/images/"$NF}' file
/images/Security_Panda.png

Using sed and regex to capture last part of url

I'm trying to make sed match the last part of a url and output just that. For example:
echo "http://randomurl/suburl/file.mp3" | sed (expression)
should give the output:
file.mp3
So far I've tried sed 's|\([^/]+mp3\)$|\1|g' but it just outputs the whole url. Maybe there's something I'm not seeing here but anyways, help would be much appreciated!
this works:
echo "http://randomurl/suburl/file.mp3" | sed 's#.*/##'
basename is your good friend.
> basename "http://randomurl/suburl/file.mp3"
=> file.mp3
This should do the job:
$ echo "http://randomurl/suburl/file.mp3" | sed -r 's|.*/(.*)$|\1|'
file.mp3
where:
| has been used instead of / to separate the arguments of the s command.
Everything is matched and replaced with whatever if found after the last /.
Edit: You could also use bash parameter substitution capabilities:
$ url="http://randomurl/suburl/file.mp3"
$ echo ${url##*/}
file.mp3
echo 'http://randomurl/suburl/file.mp3' | grep -oP '[^/\n]+$'
Here's another solution using grep.