sed match dates in wordpress url - regex

Using sed I'm having a problem trying to match and delete blog entries from a txt file before creating a sitemap.xml
# Contents of filename:
# http://www.example.com/2008/10/article3.html
# http://www.example.com/2009/11/article7.html
#!/bin/bash
hostName="www.example.com"
hostTLD="$(echo ${hostName}|cut -d . -f 3)" # results in "com"
sed -i '/\.'"${hostTLD}"'\/\([0-9]{4}\)\/\([0-9]{2}\)/d' filename
I can't figure out how to match the year/month bits. I want to remove all lines that contain ".TLD/year/month/"
I know the $hostTLD part works because I'm using it with a different match:
sed -i '/\.'"${hostTLD}"'\/category\//d' filename # works! ".TLD/category/"

You were close, but you needed to use double-quotes around your sed command and escape the braces. Try this instead:
sed -i "/\.$hostTLD\/[0-9]\{4\}\/[0-9]\{2\}/d" filename
For your second command, use this:
sed -i "/\.$hostTLD\/category\//d" filename

Related

Remove a string from a URL between AND after two different characters using sed

I have a text file which contains a list of URLs enclosed in double quotes:
"http://test.com/secure/test/12345/doc.pdf"
So I'm trying to append the URL to a file protocol, and also to remove the file name at the end of the URL.
Expected output would be:
"file://12345"
On mac, I've tried
sed -i '.bak' 's~http://test.com/secure/test/~file://~g' url.txt
The command above only appended the front part,
"file://12345/doc.pdf"
I am not too sure how do I match the first "http://test.com/secure/test/ and then how to match the next forward slash in the URL /doc.pdf", to remove the file names (which vary).
You can just adapt the following sed command to modify your file after confirming that it does work for you:
echo '"http://test.com/secure/test/12345/doc.pdf"' | sed -E 's#"http://test.com/secure/test/([^/"]*)/.*"#"file://\1"#'
"file://12345"
Explanations:
([^/"]+) will capture the 12345 part of your URL (you might have to restrict to a more specific class [0-9a-zA-Z] instead of [^/"]
/.*" will match the / and the rest of the URL
try this:
awk -F/ '{print "\"file://" $(NF-1)"\"" }' urlfile.txt
explanation
-F/ # fieldseperator is /
'{print "\"file://" # print fix part
$(NF-1)"\"" }' # print penultimate field

Replace Windows filepath in text file by using a Linux sed regular expression

I have a huge number of text files with tag-like syntax. Those files contain patterns like this:
<TAG1=foo><TAG-2=\\10.0.0.1\directory\filename.pdf><TAG3> ...
<TAG4=bar><TAG-6=\\10.0.0.1\directory\filename.tif,other content><TAG5>
I need to replace the first part of those UNC paths with new ones, meaning:
<TAG1=foo><TAG-2=D:\localdirectory\filename.pdf><TAG3> ...
<TAG4=bar><TAG-6=D:\localdirectory\filename.tif,other content><TAG7>
There is a huge number of files to process and so I need to automate this path replacement. So far I tried multiple regex with sed (on Linux) but did not get close to a solution.
#!/bin/bash
# New directory (escaped)
newpath='D:\\localdirectory\\'
# Actual replacement (don't work)
sed -i "s#\(<TAG-2=\)\([^\\]+\.pdf\)#\1${newpath}\2#g" filetoprocess.txt
sed -i "s#\(<TAG-6=\)\([^\\]+\.tif\)#\1${newpath}\2#g" filetoprocess.txt
Any suggestions are welcome
This shell script using sed might work:
#!/bin/bash
oldpath='\\\\10\.0\.0\.1\\directory\\'
newpath='D:\\localdirectory\\'
#sed -i "s#${oldpath}#${newpath}#g" filetoprocess.txt
sed -r -i "s#(<TAG-2=)${oldpath}([^>]+pdf)#\1${newpath}\2#g;
s#(<TAG-6=)${oldpath}([^>]+tif)#\1${newpath}\2#g;
" filetoprocess.txt
In the first line the shell shebang is #! (notice the exclamation mark). And I believe that the second line in your input example should have the TAG-6.
In the paths some care is necessary for the characters that have a special meaning in regular expressions:
you have to escape the . und \ with a backslash
this leads to the funny looking \\\\ (two escaped backslashes)
In the last line the -r option saves a bit of escaping in the argument. Note that I used [^>]+ instead of [^\\]+ to get the path part until the extension.
The [^\\]+ in your sed command would match everything after the = which is not a \ and that is only the D: part.
So your replacement would only match a literal D:.pdf.
But I would suggest trying the other (commented) sed command that just replaces the paths no matter what TAG and fileextension is.
(Backup your files before, since you use -i inplace replacement.)
Finally I came with the following regular expression. This solution can also, manage "/" Unix paths, dollar ($) and hyphens (-) :
sed -i -r 's#(<TAG-2=|TAG-6=)([\/]{2})([0-9.a-zA-Z_$ -]+[\/])+([0-9.a-zA-Z_$ -]+\.[pPtT][dDiI][fF])#\1'"${newpath}"'\\\4#g'

sed find and replace a string with spaces

Having the following in a file:
public $password = 'XYZ';
I'm trying to replace the password's value with a different one, through an automated deployment process from backup files.
I have the regext that will match the string above in a file, but not much compatible with sed
(public\s\$password\s=\s'(.*)'?)
I also tried
sed -i -e "s/public\s\$password\s=\s'(.*)'/private\s\$password\s=\s'jingle'" configuration.php
Any ideas?
Try this:
sed -i -e "s/public\s\$password\s=\s'\(.*\)'/private \$password = 'jingle'/" configuration.php
The problem was that you need to 'escape' the round brackets, and that \s doesn't work in the output pattern. You also had missed the final /.

Regular expression required for replacing string in shell script

Can anyone please help me write a shell script in linux which would replace the hostname in a particular file.
eg : I have multiple files which have certain ip addresses.
http://10.160.228.12:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Basically what I would want to replace is the string between "http://" and ":8001" with any required string.
Can someone help me with this please.
Some More info:-
I want to do this iteratively across many folders. So basically it will search all the files in each folder and perform the necessary changes.
You could use sed. Saying:
sed -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
would replace is the string between "http://" and ":8001" with something.
If you want to make the change to the file in-place, use the -i option:
sed -i -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
Use sed command from Linux shell
sed -i 's%OldHost%NewHost%g' /yourfolder/yourfile
Tried with "for"
# cat replace.txt
http://10.160.228.12:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
# for i in `cat replace.txt | awk -F: '{print $2}' | sed 's/^\/\///g' | sed '/^$/d'` ; do sed -i "s/$i/Your_hostname/" replace.txt ; done
# cat replace.txt
http://Your_hostname:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://Your_hostname:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Its working for me...!

Replace a line with sed but keeping original line

I have a line like:
param1='123'
I would like the following:
param1='123'
param2=123
Where 123 can be any value.
I can get param2 using:
sed -i "s/param1=\([0-9]\+\)/param2='\1'/g" '{}' \;
But then I will lose param1.
I can also append line param2 using:
sed -i "param1='\([0-9]\+\)';/a \param2=\1;"
But the pattern isn't recognised and I end up with param2=1
Is there a way to combine these two commands or another way of working this?
Giving a extension to the -i flag create a backup so foo.ini will be updated and the original unmodified version will be found foo.ini.bak:
$ find . -name '*ini' -exec sed -ri.bak 's/param1=.([0-9]+)./&\nparam2=\1/' {} \;
In the sed command the g flag is probably redundant as in Unix configuration files a single option is set on a single line. The command replaces param1=123 with param1=123\nparam2=123 as & represents the whole match and the value 123 is caught in the first capture group.
& # Whole match
\n # Newline character
param2= # Literal string
\1 # First capture group
So basically the line is duplicated but option is changed and the value stays the same.
This might work for you (GNU sed):
sed "p;s/1='\([^']*\)'.*/2=\1/" file
You could say:
sed -r "s/(param1='([^']*)')/\1\nparam2=\2/" filename
(Add the -i option for in-place edit.)
sed -i "s/\(param1='\([0-9]\+\)'\)/\1\nparam2=\2/g"
seems to work.
Similar to what devnull answered, but works without the -r (Need to add backslash before parenthesis when not using extended regexp).