using sed to search and replace URLs except in certain cases

using sed to search and replace URLs except in certain cases - regex

I have a list of urls where, for the majority, I want to do a simple search and replace, but in some cases I want to exclude using sed.
Given the list below:
http://www.dol.gov
http://www.science.gov
http://www.whitehouse.gov
http://test.sandbox.local
http://www.travel.state.gov
http://www.lib.berkeley.edu
http://dev.sandbox.local
I want to convert all URLs that do not have "sandbox" in the URL to:
href="/fetch?domain=<url>"
What I have so far with sed is the following:
sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g'
which reformats all the URLs as expected.
How do I modify what I have to exclude the lines that have "sandbox" in them?
Thanks in advance for your help!

If by exclude, you mean "do not do the replacement", then:
sed -r '/sandbox/!s|http://(\S*)|href="/fetch\?domain=\1"|g'
If you mean 'omit completely from the output':
sed -r -e '/sandbox/d' -e 's|http://(\S*)|href="/fetch\?domain=\1"|g'

sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g' | grep -v sandbox

Related

Rewrite URL using sed while maintaining filename

I would like to find all instances of a URL in a file and replace them with a different link structure.
An example would be convert http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png to /images/Security_Panda.png.
I am able to identify the link using a regular expression such as:
^(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)
but need to rewrite using sed so that the file name is maintained. I understand that I will need to use s/${PATTERN}/${REPLACEMENT}/g.
Tried: sed -i 's#(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)#/dir/$1#g' test without success? Thoughts on how to improve the approach?

In basic sed, you need to escape the () symbols like \(..\) to mean a capturing group.
sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g' file
Example:
$ echo 'http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png' | sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g'
/images/Security_Panda.png

You can use:
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' file
/images/Security_Panda.png
Testing:
s='http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png'
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' <<< "$s"
/images/Security_Panda.png

Easier way if you change your idea.
#!/usr/bin/env bash
URL="http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png"
echo "/image/${URL##*/}"

Another way
command line
sed 's#^http:.*/\(.*\).$#/images/\1#g'
Example
echo "http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png "|sed 's#^http:.*/\(.*\).$#/images/\1#g'
results
/images/Security_Panda.png

An awk version:
awk -F\/ '/(jpg|gif|png) *$/ {print "/images/"$NF}' file
/images/Security_Panda.png

Sed replace domain in URL

I have these strings http://sub.domain.com/myuri/default.aspx, https://sub.domain.com/myuri/default.aspx and https://domain.com
Is it possible to use sed to replace only the domain part?
For example, this URL:
http://sub.domain.com/myuri/default.aspx
Would become:
http://anotherdomain.com/myuri/default.aspx
Please note that the protocol may differ between https and http.
I did search but could not find something similar.

You will need non-greedy pattern that sed can't offer, use perl instead:
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g'
Edit:
awk also does the job well and it's even simpler actually:
awk -F/ 'gsub($3,"anotherdomain",$0)' <<< "$urls"
Example:
#!/bin/bash
urls=$(cat << 'EOF'
https://sub.domain.com/myuri/default.aspx
http://sub.domain.com/myuri/default.aspx
http://blabla
EOF
)
perl -pe '/(http|https):\/\/(.*?)(\/|$)/ && s/$2/anotherdomain/g' <<< "$urls"
Output:
bash test.sh
https://anotherdomain/myuri/default.aspx
http://anotherdomain/myuri/default.aspx
http://anotherdomain

If I follow your question, then yes sed 's/sub\.domain\.com/anotherdomain\.com/1' -
echo "http://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
http://anotherdomain.com/myuri/default.aspx
And with,
echo "https://sub.domain.com/myuri/default.aspx" | \
sed 's/sub\.domain\.com/anotherdomain\.com/1'
Output is
https://anotherdomain.com/myuri/default.aspx

You can use sed like this:
sed -r 's|(https?://)[^/]+([[^:blank:]]*)|\1anotherdomain.com\2|g' file
http://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com/myuri/default.aspx
https://anotherdomain.comn.com
PS: Use sed -E on OSX.

Based on #hek2mgl's solution:
SERVER=www.example.com
sed "s=\(https\?://\)[^/]\+=\1$SERVER=" \
<<< 'https://anotherdomain.com/myuri/default.aspx'
It will output:
https://www.example.com/myuri/default.aspx
Modifications from hek2mgl's sed line:
a little shorter (no need to catch the part after domain name to paste it as is in replacement)
deals with both http:// and https:// syntax

You can use sed:
SERVER=www.example.com
sed "s~https\?://\([^/]\+\)\(.*\)~http://$SERVER\2~" <<< "http://newsub.domain.com/myuri/default

Regular expression required for replacing string in shell script

Can anyone please help me write a shell script in linux which would replace the hostname in a particular file.
eg : I have multiple files which have certain ip addresses.
http://10.160.228.12:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Basically what I would want to replace is the string between "http://" and ":8001" with any required string.
Can someone help me with this please.
Some More info:-
I want to do this iteratively across many folders. So basically it will search all the files in each folder and perform the necessary changes.

You could use sed. Saying:
sed -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
would replace is the string between "http://" and ":8001" with something.
If you want to make the change to the file in-place, use the -i option:
sed -i -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename

Use sed command from Linux shell
sed -i 's%OldHost%NewHost%g' /yourfolder/yourfile

Tried with "for"
# cat replace.txt
http://10.160.228.12:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
# for i in `cat replace.txt | awk -F: '{print $2}' | sed 's/^\/\///g' | sed '/^$/d'` ; do sed -i "s/$i/Your_hostname/" replace.txt ; done
# cat replace.txt
http://Your_hostname:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://Your_hostname:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Its working for me...!

how to extract these fields via sed?

I'm trying to grep for individual quantities in lines like this:
foo=24.587 bar=88 fox=jobs
and extract, say, all the '88' values..the number of columns isn't consistent so awk followed by a cut wont cut it.
I tried using sed like this:
sed -e 's/.*\s\(bar=.+\)\s.*/\1/g'
and that just dumps the entire line. I'm not sure how to correct this regexp, and more importantly why this regexp doesnt do what I expect?

Use -r (extended regex). This tends to use regexen more like you may expect. You have to remove the backslashes from the parens, though:
$ echo "foo=24.587 bar=88 fox=jobs" | sed -r 's/.*\s(bar=.+)\s.*/\1/g'
bar=88

sed -r 's/.*\s(bar=.+)\s.*/\1/g'

Filter apache log file using regular expression

I have a big apache log file and I need to filter that and leave only (in a new file) the log from a certain IP: 192.168.1.102
I try using this command:
sed -e "/^192.168.1.102/d" < input.txt > output.txt
But "/d" removes those entries, and I needt to leave them.
Thanks.

What about using grep?
cat input.txt | grep -e "^192.168.1.102" > output.txt
EDIT: As noted in the comments below, escaping the dots in the regex is necessary to make it correct. Escaping in the regex is done with backslashes:
cat input.txt | grep -e "^192\.168\.1\.102" > output.txt

sed -n 's/^192\.168\.1\.102/&/p'
sed is faster than grep on my machines

I think using grep is the best solution but if you want to use sed you can do it like this:
sed -e '/^192\.168\.1\.102/b' -e 'd'
The b command will skip all following commands if the regex matches and the d command will thus delete the lines for which the regex did not match.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using sed to search and replace URLs except in certain cases - regex

If by exclude, you mean "do not do the replacement", then: sed -r '/sandbox/!s|http://(\S)|href="/fetch\?domain=\1"|g' If you mean 'omit completely from the output': sed -r -e '/sandbox/d' -e 's|http://(\S)|href="/fetch\?domain=\1"|g'

sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g' | grep -v sandbox

Related

Rewrite URL using sed while maintaining filename

Sed replace domain in URL

Regular expression required for replacing string in shell script

how to extract these fields via sed?

Filter apache log file using regular expression

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using sed to search and replace URLs except in certain cases - regex

If by exclude, you mean "do not do the replacement", then: sed -r '/sandbox/!s|http://(\S*)|href="/fetch\?domain=\1"|g' If you mean 'omit completely from the output': sed -r -e '/sandbox/d' -e 's|http://(\S*)|href="/fetch\?domain=\1"|g'

sed -r 's|http://(\S*)|href="/fetch\?domain=\1"|g' | grep -v sandbox

Related

Rewrite URL using sed while maintaining filename

Sed replace domain in URL

Regular expression required for replacing string in shell script

how to extract these fields via sed?

Filter apache log file using regular expression

Categories

Resources

If by exclude, you mean "do not do the replacement", then: sed -r '/sandbox/!s|http://(\S)|href="/fetch\?domain=\1"|g' If you mean 'omit completely from the output': sed -r -e '/sandbox/d' -e 's|http://(\S)|href="/fetch\?domain=\1"|g'