Conditional in perl regex replacement - regex

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?

You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.

Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1

And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

Related

Perl Regex Command Line Issue

I'm trying to use a negative lookahead in perl in command line:
echo 1.41.1 | perl -pe "s/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g"
to get an incremented version that looks like this:
1.41.2
but its just returning me:
![0-9]+\.[0-9]+\.: event not found
i've tried it in regex101 (PCRE) and it works fine, so im not sure why it doesn't work here
In Bash, ! is the "history expansion character", except when escaped with a backslash or single-quotes. (Double-quotes do not disable this; that is, history expansion is supported inside double-quotes. See Difference between single and double quotes in Bash)
So, just change your double-quotes to single-quotes:
echo 1.41.1 | perl -pe 's/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g'
and voilà:
1.41.2
I'm guessing that this expression also might work:
([0-9.]+)\.([0-9]+)
Test
perl -e'
my $name = "1.41.1";
$name =~ s/([0-9.]+)\.([0-9]+)/$1\.2/;
print "$name\n";
'
Output
1.41.2
Please see the demo here.
If you want to "increment" a number then you can't hard-code the new value but need to capture what is there and increment that
echo "1.41.1" | perl -pe's/[0-9]+\.[0-9]+\.\K([0-9]+)/$1+1/e'
Here /e modifier makes it so that the replacement side is evaluated as code, and we can +1 the captured number, what is then substituted. The \K drops previous matches so we don't need to put them back; see "Lookaround Assertions" in Extended Patterns in perlre.
The lookarounds are sometimes just the thing you want, but they increase the regex complexity (just by being there), can be tricky to get right, and hurt efficiency. They aren't needed here.
The strange output you get is because the double quotes used around the Perl program "invite" the shell to look at what's inside whereby it interprets the ! as history expansion and runs that, as explained in ruakh's post.
As an alternate to lookahead, we can use capture groups, e.g. the following will capture the version number into 3 capture groups.
(\d+)\.(\d+)\.(\d+)
If you wanted to output the captured version number as is, it would be:
\1.\2.\3
And to just replace the 3rd part with the number "2" would be:
\1.\2.2
To adapt this to the OP's question, it would be:
$ echo 1.14.1 | perl -pe 's/(\d+)\.(\d+)\.(\d+)/\1.\2.2/'
1.14.2
$

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

Why do I get "-bash: syntax error near unexpected token `('" when I run my Perl one-liner?

This is driving me insane. Here's my dilemma, I have a file in which I need to make a match. Usually I use Perl and it works like a charm but in this case I am writing a shell script and for some reason it is throwing errors.
Here is what I am trying to match:
loop_loopStorage_rev='latest.integration'
I need to match loop and latest.integration.
This is my regex:
^(?!\#)(loop_.+rev).*[\'|\"](.*)[\'|\"]$
When I use this in a Perl script, $1 and $2 give me the appropriate output. If I do this:
perl -nle "print qq{$1 => $2} while /^(?!#)(loop_.+rev).+?[\'|\"](.+?)[\'|\"]$/g" non-hadoop.env
I get the error:
-bash: syntax error near unexpected token `('
I believe it has something to do with the beginning part of my regex. So my real question is would there be an easier solution using sed, egrep or awk? If so, does any one know where to begin?
Using single quotes around your arguments to prevent special processing of $, \, etc. If you need to include a single quote within, the generic solution is to use '\''. In this particular case, however, we can avoid trying to include a ' by using the equivalent \x27 in the regex pattern instead.
perl -nle'
print "$1 => $2"
while /^(?!#)(loop_.+rev).+?[\x27\"|](.+?)[\x27\"|]$/g;
' non-hadoop.env
[I added some line breaks for readability. You can actually leave them in if you want to, but you don't need to.]
Note that there are some problems with your regex pattern.
(?!\#)(loop_.+rev) is the same as (loop_.+rev) since l isn't #, so (?!\#) isn't doing whatever you think it's doing.
[\'|\"] matches ', " and |, but I think you only meant it to match ' and ". If so, you want to use [\'\"], which can be simplified to ['"].
Don't use the non-greedy modifier (? after +, *, etc). It's used for optimization, not for excluding characters. In fact, the second ? in your pattern has absolutely no effect, so it's not doing what you think it's doing.
Fixed?
perl -nle'
print "$1 => $2"
while /^(loop_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g;
' non-hadoop.env
Double quotes cause Bash to replace variable references like $1 and $2 with the values of these shell variables. Use single quotes around your Perl script to avoid this (or quote every dollar sign, backtick, etc in the script).
However, you cannot escape single quotes inside single quotes easily; a common workaround in Perl strings is to use the character code \x27 instead. If you need single-quoted Perl strings, use the generalized single-quoting operator q{...}.
If you need to interpolate a shell variable name, a common trick is to use "see-saw" quoting. The string 'str'"in"'g' in the shell is equal to 'string' after quote removal; you can similarly use adjacent single-quoted and double-quoted strings to build your script ... although it does tend to get rather unreadable.
perl -nle 'print "Instance -> $1\nRevision -> $2"
while /^(?!#)('"$NAME"'_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g' non-hadoop.en
(Notice that the options -nle are not part of the script; the script is the quoted argument to the -e option. In fact perl '-nle script ...' coincidentally works, but it is decidedly unidiomatic, to the point of confusing.)
I ended up figuring out due to all of you guys help. Thanks again. Here is my final command
perl -nle 'print "$1 $2" while /^($ENV{NAME}_.+rev).+\x27(.+)\x27/g;' $ENVFILE

regexp greedness: shrinking long path

Please have a look at my mind-breaker.
I'd stuck in shrinking with regex some long path, like this:
/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890
I'd like to transform this path to the following form:
/123/123/123/123/12/1/123/123/123/123
each "directory" in a path abbreviates to only 3 first characters
LONG_PATH="/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890"
perl -pe "s#/(.{1,3})[^/]*?(/|$)#/\1\2#g" <<<$LONG_PATH
/123/123456/123/123/12//1234567/132/123456789/123
sed -E "s#/(.{1,3})[^/]*?(/|$)#/\1\2#g" <<<$LONG_PATH
/123/123456/123/123/12//1234567/132/123456789/123
I have tried also:
perl -pe "s,/(.)(.)?(.)?[^/]*+,/\1\2\3,g" <<<$LONG_PATH
/123/123/123/123/12//123/132/123/123
and many another, no "luck" - I still have no idea about.
Please point me a right way to success.
Match up to three non-slash characters and capture them. Then match the rest until the next slash. Replace by the capture:
"s#(/[^/]{3})[^/]*#\1#g"
There is no need for ungreediness or anything here, because the negated character class is mutually exclusive with the / or $.
EDIT: Although you seem to know this I should probably clarify for future visitors that this will work with either perl -pe... or sed -E... as you have used it in your question. The regex could also be used as is with sed -r.... If you leave out the -E or -r option, then (as usual) you will need to escape both the parentheses and curly brackets:
sed "s#\(/[^/]\{3\}\)[^/]*#\1#g" filename
Note also as ikegami points out that in Perl you should rather use $1 in the replacement than \1.
You could do it like this:
perl -pe's#[^/]{3}\K[^/]*##g'
/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890
/123/123/123/123/12/1/123/132/123/123
Find 3 non-slashes, and keep (\K) them, remove the following characters up until the next slash.
As ikegami pointed out, it is not required to match less than three characters, in which case a lookbehind assertion can be used instead of \K. The benefit is that \K requires perl v5.10, and I believe look-around assertions predate that.
perl -pe 's#(?<=[^/]{3})[^/]*##g'
The best way seems to use the File::Spec module to split and recombine a path. An intermediate call to map will reduce each path segment to its first three characters. This program demonstrates
use strict;
use warnings;
use File::Spec;
my $path = '/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890';
my $newpath = File::Spec->catdir(map substr($_, 0, 3), File::Spec->splitdir($path));
print $newpath;
output
/123/123/123/123/12/1/123/132/123/123

Monster perl regex

I'm trying to change strings like this:
<a href='../Example/case23.html'><img src='Blablabla.jpg'
To this:
<a href='../Example/case23.html'><img src='<?php imgname('case23'); ?>'
And I've got this monster of a regular expression:
find . -type f | xargs perl -pi -e \
's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/'
But it isn't working. In fact, I think it's a problem with Bash, which could probably be pointed out rather quickly.
r: line 4: syntax error near unexpected token `('
r: line 4: ` 's/<a href=\'(.\.\.\/Example\/)(case\d\d)(.\.html\'><img src=\')*\'/\1\2\3<\?php imgname\(\'\2\'); \?>\'/''
But if you want to help me with the regular expression that'd be cool, too!
Teaching you how to fish:
s/…/…/
Use a separator other than / for the s operator because / already occurs in the expression.
s{…}{…}
Cut down on backslash quoting, prefer [.] over \. because we'll shellquote later. Let's keep backslashes only for the necessary or important parts, namely here the digits character class.
s{<a href='[.][.]/Example/case(\d\d)[.]html'>…
Capture only the variable part. No need to reassemble the string later if the most part is static.
s{<a href='[.][.]/Example/case(\d\d)[.]html'><img src='[^']*'}{<a href='../Example/case$1.html'><img src='<?php imgname('case$1'); ?>'}
Use $1 instead of \1 to denote backreferences. [^']* means everything until the next '.
To serve now as the argument for the Perl -e option, this program needs to be shellquoted. Employ the following helper program, you can also use an alias or shell function instead:
> cat `which shellquote`
#!/usr/bin/env perl
use String::ShellQuote qw(shell_quote); undef $/; print shell_quote <>
Run it and paste the program body, terminate input with Ctrl+d, you receive:
's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'
Put this together with shell pipeline.
find . -type f | xargs perl -pi -e 's{<a href='\''[.][.]/Example/case(\d\d)[.]html'\''><img src='\''[^'\'']*'\''}{<a href='\''../Example/case$1.html'\''><img src='\''<?php imgname('\''case$1'\''); ?>'\''}'
Bash single-quotes do not permit any escapes.
Try this at a bash prompt and you'll see what I mean:
FOO='\'foo'
will cause it to prompt you looking for the fourth single-quote. If you satisfy it, you'll find FOO's value is
\foo
You'll need to use double-quotes around your expression. Although in truth, your HTML should be using double-quotes in the first place.
Single quotes within single quotes in Bash:
set -xv
echo ''"'"''
echo $'\''
I wouldn't use a one-liner. Put your Perl code in a script, which makes it much easier to get the regex right without wondering about escaping quotes and such.
I'd use a script like this:
#!/usr/bin/perl -pi
use strict;
use warnings;
s{
( <a \b [^>]* \b href=['"] [^'"]*/case(\d+)\.html ['"] [^>]* > \s*
<img \b [^>]* \b src=['"] ) [^'"<] [^'"]*
}{$1<?php imgname('case$2'); ?>}gix;
and then do something like:
find . -type f | xargs fiximgs
– Michael
if you install the package mysql, it comes with a command called replace.
With the replace command you can:
while read line
do
X=`echo $line| replace "<a href='../Example/" ""|replace ".html'><" " "|awk '{print $1}'`
echo "<a href='../Example/$X.html'><img src='<?php imgname('$X'); ?>'">NewFile
done < myfile
same can be done with sed. sed s/'my string'/'replace string'/g.. replace is just easier to work with special characters.