using sed to search and replace URLs except in certain cases - regex

I have a list of urls where, for the majority, I want to do a simple search and replace, but in some cases I want to exclude using sed.
Given the list below:
http://www.dol.gov
http://www.science.gov
http://www.whitehouse.gov
http://test.sandbox.local
http://www.travel.state.gov
http://www.lib.berkeley.edu
http://dev.sandbox.local
I want to convert all URLs that do not have "sandbox" in the URL to:
href="/fetch?domain=<url>"
What I have so far with sed is the following:
sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g'
which reformats all the URLs as expected.
How do I modify what I have to exclude the lines that have "sandbox" in them?
Thanks in advance for your help!

If by exclude, you mean "do not do the replacement", then:
sed -r '/sandbox/!s|http://(\S*)|href="/fetch\?domain=\1"|g'
If you mean 'omit completely from the output':
sed -r -e '/sandbox/d' -e 's|http://(\S*)|href="/fetch\?domain=\1"|g'

sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g' | grep -v sandbox

Related

Rewrite URL using sed while maintaining filename

I would like to find all instances of a URL in a file and replace them with a different link structure.
An example would be convert http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png to /images/Security_Panda.png.
I am able to identify the link using a regular expression such as:
^(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)
but need to rewrite using sed so that the file name is maintained. I understand that I will need to use s/${PATTERN}/${REPLACEMENT}/g.
Tried: sed -i 's#(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)#/dir/$1#g' test without success? Thoughts on how to improve the approach?
In basic sed, you need to escape the () symbols like \(..\) to mean a capturing group.
sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g' file
Example:
$ echo 'http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png' | sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g'
/images/Security_Panda.png
You can use:
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' file
/images/Security_Panda.png
Testing:
s='http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png'
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' <<< "$s"
/images/Security_Panda.png
Easier way if you change your idea.
#!/usr/bin/env bash
URL="http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png"
echo "/image/${URL##*/}"
Another way
command line
sed 's#^http:.*/\(.*\).$#/images/\1#g'
Example
echo "http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png "|sed 's#^http:.*/\(.*\).$#/images/\1#g'
results
/images/Security_Panda.png
An awk version:
awk -F\/ '/(jpg|gif|png) *$/ {print "/images/"$NF}' file
/images/Security_Panda.png

Sed replace domain in URL

I have these strings http://sub.domain.com/myuri/default.aspx, https://sub.domain.com/myuri/default.aspx and https://domain.com
Is it possible to use sed to replace only the domain part?
For example, this URL:
http://sub.domain.com/myuri/default.aspx
Would become:
http://anotherdomain.com/myuri/default.aspx
Please note that the protocol may differ between https and http.
I did search but could not find something similar.
You will need non-greedy pattern that sed can't offer, use perl instead:
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g'
Edit:
awk also does the job well and it's even simpler actually:
awk -F/ 'gsub($3,"anotherdomain",$0)' <<< "$urls"
Example:
#!/bin/bash
urls=$(cat << 'EOF'
https://sub.domain.com/myuri/default.aspx
http://sub.domain.com/myuri/default.aspx
http://blabla
EOF
)
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g' <<< "$urls"
Output:
bash test.sh
https://anotherdomain/myuri/default.aspx
http://anotherdomain/myuri/default.aspx
http://anotherdomain
If I follow your question, then yes sed 's/sub\.domain\.com/anotherdomain\.com/1' -
echo "http://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
http://anotherdomain.com/myuri/default.aspx
And with,
echo "https://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
https://anotherdomain.com/myuri/default.aspx
You can use sed like this:
sed -r 's|(https?://)[^/]+([[^:blank:]]*)|\1anotherdomain.com\2|g' file
http://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com
PS: Use sed -E on OSX.
Based on #hek2mgl's solution:
SERVER=www.example.com
sed "s=\(https\?://\)[^/]\+=\1$SERVER=" \
<<< 'https://anotherdomain.com/myuri/default.aspx'
It will output:
https://www.example.com/myuri/default.aspx
Modifications from hek2mgl's sed line:
a little shorter (no need to catch the part after domain name to paste it as is in replacement)
deals with both http:// and https:// syntax
You can use sed:
SERVER=www.example.com
sed "s~https\?://\([^/]\+\)\(.*\)~http://$SERVER\2~" <<< "http://newsub.domain.com/myuri/default

Regular expression required for replacing string in shell script

Can anyone please help me write a shell script in linux which would replace the hostname in a particular file.
eg : I have multiple files which have certain ip addresses.
http://10.160.228.12:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Basically what I would want to replace is the string between "http://" and ":8001" with any required string.
Can someone help me with this please.
Some More info:-
I want to do this iteratively across many folders. So basically it will search all the files in each folder and perform the necessary changes.
You could use sed. Saying:
sed -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
would replace is the string between "http://" and ":8001" with something.
If you want to make the change to the file in-place, use the -i option:
sed -i -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
Use sed command from Linux shell
sed -i 's%OldHost%NewHost%g' /yourfolder/yourfile
Tried with "for"
# cat replace.txt
http://10.160.228.12:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
# for i in `cat replace.txt | awk -F: '{print $2}' | sed 's/^\/\///g' | sed '/^$/d'` ; do sed -i "s/$i/Your_hostname/" replace.txt ; done
# cat replace.txt
http://Your_hostname:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://Your_hostname:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Its working for me...!

how to extract these fields via sed?

I'm trying to grep for individual quantities in lines like this:
foo=24.587 bar=88 fox=jobs
and extract, say, all the '88' values..the number of columns isn't consistent so awk followed by a cut wont cut it.
I tried using sed like this:
sed -e 's/.*\s\(bar=.+\)\s.*/\1/g'
and that just dumps the entire line. I'm not sure how to correct this regexp, and more importantly why this regexp doesnt do what I expect?
Use -r (extended regex). This tends to use regexen more like you may expect. You have to remove the backslashes from the parens, though:
$ echo "foo=24.587 bar=88 fox=jobs" | sed -r 's/.*\s(bar=.+)\s.*/\1/g'
bar=88
sed -r 's/.*\s(bar=.+)\s.*/\1/g'

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.
What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt
sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines
I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.