Monster perl regex - regex

I'm trying to change strings like this:
<a href='../Example/case23.html'><img src='Blablabla.jpg'
To this:
<a href='../Example/case23.html'><img src='<?php imgname('case23'); ?>'
And I've got this monster of a regular expression:
find . -type f | xargs perl -pi -e \
's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/'
But it isn't working. In fact, I think it's a problem with Bash, which could probably be pointed out rather quickly.
r: line 4: syntax error near unexpected token `('
r: line 4: ` 's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/''
But if you want to help me with the regular expression that'd be cool, too!

Teaching you how to fish:
s/…/…/
Use a separator other than / for the s operator because / already occurs in the expression.
s{…}{…}
Cut down on backslash quoting, prefer [.] over \. because we'll shellquote later. Let's keep backslashes only for the necessary or important parts, namely here the digits character class.
s{<a href='[.][.]/Example/case(\d\d)[.]html'>…
Capture only the variable part. No need to reassemble the string later if the most part is static.
s{<a href='[.][.]/Example/case(\d\d)[.]html'><img src='[^']*'}{<a href='../Example/case$1.html'><img src='<?php imgname('case$1'); ?>'}
Use $1 instead of \1 to denote backreferences. [^']* means everything until the next '.
To serve now as the argument for the Perl -e option, this program needs to be shellquoted. Employ the following helper program, you can also use an alias or shell function instead:
> cat `which shellquote`
#!/usr/bin/env perl
use String::ShellQuote qw(shell_quote); undef $/; print shell_quote <>
Run it and paste the program body, terminate input with Ctrl+d, you receive:
's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'
Put this together with shell pipeline.
find . -type f | xargs perl -pi -e 's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'

Bash single-quotes do not permit any escapes.
Try this at a bash prompt and you'll see what I mean:
FOO='\'foo'
will cause it to prompt you looking for the fourth single-quote. If you satisfy it, you'll find FOO's value is
\foo
You'll need to use double-quotes around your expression. Although in truth, your HTML should be using double-quotes in the first place.

Single quotes within single quotes in Bash:
set -xv
echo ''"'"''
echo $'\''

I wouldn't use a one-liner. Put your Perl code in a script, which makes it much easier to get the regex right without wondering about escaping quotes and such.
I'd use a script like this:
#!/usr/bin/perl -pi
use strict;
use warnings;
s{
( <a \b [^>]* \b href=['"] [^'"]*/case(\d+)\.html ['"] [^>]* > \s*
<img \b [^>]* \b src=['"] ) [^'"<] [^'"]*
}{$1<?php imgname('case$2'); ?>}gix;
and then do something like:
find . -type f | xargs fiximgs
– Michael

if you install the package mysql, it comes with a command called replace.
With the replace command you can:
while read line
do
X=`echo $line| replace "<a href='../Example/" ""|replace ".html'><" " "|awk '{print $1}'`
echo "<a href='../Example/$X.html'><img src='<?php imgname('$X'); ?>'">NewFile
done < myfile
same can be done with sed. sed s/'my string'/'replace string'/g.. replace is just easier to work with special characters.

Related

Bash SED string replacement - removes characters before and after Regex

I have this simple bash (3) script to scan through all the files in the directory and replace some old CSS classes with new ones.
export LC_ALL=C
ARRAY=(
"a-oldclass:new-class"
"m-oldclass:new-class"
)
for className in "${ARRAY[#]}" ; do
REGEX=[^a-zA-Z0-9]${className%%:*}[^a-zA-Z0-9]
CHANGE="s/${REGEX}/${className##*:}/g"
find src -type f -exec sed -i '' "${CHANGE}" '{}' +
done
It is a combination of key:value pairs and a regular expression.
The problem is that it also removes special characters before and after the matching pattern, like:
class="a-oldclass" => class=new-class (Quotes are gone)
class=" a-oldclass " => class="new-class" (spaces are gone)
I need this outcome:
class="a-oldclass m-oldclass" => class="new-class new-class".
[^a-zA-Z0-9] is necessary to avoid this scenario:
I want to replace a-oldclass with new-class, but I don't want to touch class data-oldclass. Since this string contains a-oldclass it would be modified. So with [^a-zA-Z0-9] I exclude this kind of scenarios.
This should be the regular expression:
REGEX='\([^a-zA-Z0-9]\)'"${className%%:*}"'\([^a-zA-Z0-9]\)'
CHANGE="s/${REGEX}/\1${className##*:}\2/g"
This uses \( \) and \1 \2 to reproduce the matches before and after the classname.
Additionally, I recommend against using all-capital-variables as they may conflict with BASH default variables.
In case you also need to match newline terminated strings, you can add
REGEX='\([^a-zA-Z0-9]\)'"${className%%:*}"'\([^a-zA-Z0-9]\)'
CHANGE="s/${REGEX}/\1${className##*:}\2/g"
REGEXNL='\([^a-zA-Z0-9]\)'"${className%%:*}"'$'
CHANGENL="s/${REGEXNL}/\1${className##*:}/g"
and change the sed command to
sed -i -e "${CHANGE}" -e "${CHANGENL}"
I bet there is a more elegant solution, but this sed survived the -posix test.

Conditional in perl regex replacement

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?
You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.
Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1
And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

Find all text within square brackets using regex

I have a problem that because of PHP version, I need to change my code from $array[stringindex] to $array['stringindex'];
So I want to find all the text using regex, and replace them all. How to find all strings that look like this? $array[stringindex].
Here's a solution in PHP:
$re = "/(\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\])/";
$str = "here is \$array[stringindex] but not \$array['stringindex'] nor \$3array[stringindex] nor \$array[4stringindex]";
$subst = "$1'$2'$3";
$result = preg_replace($re, $subst, $str);
You can try it out interactively here. I search for variables beginning with a letter, otherwise things like $foo[42] would be converted to $foo['42'], which might not be desirable.
Note that all the solutions here will not handle every case correctly.
Looking at the Sublime Text regex help, it would seem you could just paste (\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\]) into the Search box and $1'$2'$3 into the Replace field.
It depends of the tool you want to use to do the replacement.
with sed for exemple, it would be something like that:
sed "s/\(\$array\)\[\([^]]*\)\]/\1['\2']/g"
If sed is allowed you could simply do:
sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" file
Explanation:
sed "s/pattern/replace/g" is a sed command which searches for pattern and replaces it with replace. The g options means replace multiple times per line.
(\$[^[]*[)([^]]*)] this pattern consists of two groups (in between brackets). The first is a dollar followed by a series of non [ chars. Then an opening square bracket follows, followed by a series of non closing brackets which is then followed by a closing square bracket.
\1'\2'] the replacement string: \1 means insert the first captured group (analogous for \2. Basically we wrap \2 in quotes (which is what you wanted).
the -i options means that the changes should be applied to the original file, which is supplied at the end.
For more information, see man sed.
This can be combined with the find command, as follows:
find . -name '*.php' -exec sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" '{}' \;
This will apply the sed command to all php files found.

search and replace substring in string in bash

I have the following task:
I have to replace several links, but only the links which ends with .do
Important: the files have also other links within, but they should stay untouched.
<li>Einstellungen verwalten</li>
to
<li>Einstellungen verwalten</li>
So I have to search for links with .do, take the part before and remember it for example as $a , replace the whole link with
<s:url action=' '/>
and past $a between the quotes.
I thought about sed, but sed as I know does only search a whole string and replace it complete.
I also tried bash Parameter Expansions in combination with sed but got severel problems with the quotes and the variables.
cat ./src/main/webapp/include/stoBox2.jsp | grep -e '<a href=".*\.do">' | while read a;
do
b=${a#*href=\"};
c=${b%.do*};
sed -i 's/href=\"$a.do\"/href=\"<s:url action=\'$a\'/>\"/g' ./src/main/webapp/include/stoBox2.jsp;
done;
any ideas ?
Thanks a lot.
sed -i sed 's#href="\(.*\)\.do"#href="<s:url action='"'\1'"'/>"#g' ./src/main/webapp/include/stoBox2.jsp
Use patterns with parentheses to get the link without .do, and here single and double quotes separate the sed command with 3 parts (but in fact join with one command) to escape the quotes in your text.
's#href="\(.*\)\.do"#href="<s:url action='
"'\1'"
'/>"#g'
parameters -i is used for modify your file derectly. If you don't want to do this just remove it. and save results to a tmp file with > tmp.
Try this one:
sed -i "s%\(href=\"\)\([^\"]\+\)\.do%\1<s:url action='\2'/>%g" \
./src/main/webapp/include/stoBox2.jsp;
You can capture patterns with parenthesis (\(,\)) and use it in the replacement pattern.
Here I catch a string without any " but preceding .do (\([^\"]\+\)\.do), and insert it without the .do suffix (\2).
There is a / in the second pattern, so I used %s to delimit expressions instead of traditional /.

How to retain the first instance of a match with sed

I have a set of tokens in data and wish to strip off the trailing ".[0-9]", however i cannot figure out how to quote the regexp properly. The First match should be all up to the . and the second the . and a number. I am intending that the first match be retained.
data="thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5"
data=`echo $data | sed s/\([a-zA-Z0-9_]+\)\(\.[0-9]\)/\1/g`
echo $data
Actual output:
thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5
Desired output:
thing thing__aaa thing__bbb thing__ccc other_aaa other_bbb other_ccc
The idea is that the unquoted ([a-zA-Z0-9_]+) is the first matching group, and the (\.[0-9]) matches the .number. the \1 should replace both groups with the first group.
How about just
echo $data | sed 's/\.[0-9]//g'
or if number may contain more digits, then
echo $data | sed 's/\.[0-9]\+//g'
It looks like you just want to delete all strings of the form \.[0-9]. So why not just do:
sed 's/\.[0-9]+\b//g'
(This relies on gnu sed's \b and + extensions. For other sed you can do:
sed 's/\.[0-9][0-9]*\( \|$\)/\1/g'
I normally don't encourage the use of shell specific extensions, but if you are using bash you might be happy using an array:
bash$ data=(thing thing__aaa.0 thing__bbb.3)
bash$ echo "${data[#]%.[0-9]*}"
Note that this will also delete extensions that are not all digits (ie foo.34bb), but perhaps is adequate for your needs.)