Regex: Lookaround && some syntax - regex

I am studying about regular expression and struck with the
lookaround concept
and
with few syntax.
After doing googling, I thought it is a right forum to ask for help.
Please help with this concept.
As I am not good with understanding the explanation.
It will be great if I get plenty of different examples to understand.
For me the modifer /e and || are new in regex please help me in understanding
the real use. Below is my Perl Script.
$INPUT1="WHAT TO SAY";
$INPUT2="SAY HI";
$INPUT3="NOW SAY![BYE]";
$INPUT4="SAYO NARA![BYE]";
$INPUT1=~s/SAY/"XYZ"/e; # /e What is this modifier is for
$INPUT2=~s/HI/"XYZ"/;
$INPUT3=~s/(?<=\[)(\w+)(?=])/ "123"|| $1 /e; #What is '||' is use for and what its name
$INPUT4=~s/BYE/"123"/e;
print "\n\nINPUT1 = $INPUT1 \n \n ";
print "\n\nINPUT2 = $INPUT2 \n \n ";
print "\n\nINPUT3 = $INPUT3 \n \n ";
print "\n\nINPUT4 = $INPUT4 \n \n ";

Have a read of perlrequick and perlretut.
The /e modifier of the s/// substitution operator treats the replacement as Perl code rather than as a string. For example:
$x = "5 10"
$x =~ s/(\d+) (\d+)/$1 + $2/e;
# $x is now 15
Instead of replacing $x with the string "$1 + $2", it evaluates the Perl code $1 + $2 - where $1 is 5 and $2 is 10 - and puts the result into $x.
The || is not a regex operator, it's a normal Perl operator. It is the logical-or operator: if the left-hand side is a true value (not 0 or ''), it returns the left side, otherwise it returns the right side. You can look up perl operators in perlop.

A standard substitution operator looks like this:
s/PATTERN/REPLACEMENT/
Where the PATTERN is matched, it is replaced with REPLACEMENT. REPLACEMENT is treated as a double-quoted string so that you can put variables in there and it will just work.
s/PATTERN/$var1/
You can use this to include pieces of the matched test in your replacement.
s/PA(TT)ERN/$1/
Sometimes, however, this isn't enough. Perhaps you want to process the text and run a subroutine to work out what the replacement is. Here's a really contrived example. Suppose you have text that contains floating point numbers and you want to replace them with integers. A first approach might look like this:
#!/usr/bin/perl
use strict;
use warnings;
$_ = '12.34 5.678';
s/(\d+\.\d+)/int($1)/g;
print "$_\n";
That doesn't work, of course. You end up with "int(12.34) int(5.678)". But that string is a piece of code which you want to run in order to get the correct answer. That's what the /e option does. It treats the replacement string as code, runs it and uses the output as the replacement.
Changing the line in the example above to
s/(\d+\.\d+)/int($1)/ge;
gives us the the required result.
Now that you understand /e I hope that you don't need an explanation of ||. It's just the standard or operator that you use all the time. In your example, it means "the replacement string is either '123' or the contents of $1'. Of course, that doesn't make much sense as '123' is always going to be true, so $1 will never be used. Perhaps you wanted it the other way round - $1 or '123'.

Related

$& not resolved as part of perl substitution

I have a perl script which searches and replaces data in multiple files. Since more than one word can be replaced in a file, I wrote a function that accepts the search and replace patterns as arrays. I then loop over the arrays in this function and perform the substitution. It works well but just for one particular file, I need to append something in front of the matched string( character #). Hence, I pass "#\$&" as my replace pattern. Its received properly but somehow the $& is never resolved. Instead the operation replaces the matched string with literal value of '#$&'. The same thing works if I directly use #$& in my substituion command in the readFile function. I know we may be able to achieve the result in other ways, but I really want to know why the same replacement pattern works when passed directly while it doesn't work when read as an array element.
I have commented the substitution command that works well for reference. Can anyone please help me spot the problem here ?
my #search= ("host\\s*(replication|all)");
my #replace= ("#\$&");
my $sLine = scalar #search;
my $rLine = scalar #replace;
my $data = ???;
for ( my $i=0; $i < $sLine; $i++)
{
print("\n search = $search[$i] replace = $replace[$i] \n");
#$data =~ s/$search[$i]/#$&/g; ==> this works
$data =~ s/$search[$i]/$replace[$i]/g; #==> this doesn't
}
print($data);
The difference between the working solution and the non-working solution is the same as the difference between
print "#$&"; # Prints `#` and the value of `$&`.
and
print "$replace[$i]"; # Prints the value of `$replace[$i]`.
You can use the following:
use String::Substitution qw( gsub_modify );
for my $i (0..$#search) {
gsub_modify($data, $search[$i], $replace[$i]);
}
This is a more in-depth explanation.
s/$search[$i]/#$&/g
is short for
s/$search[$i]/ "#$&" /eg
which is equivalent to
s/$search[$i]/ "#" . $& /eg # Replaces with `#` and the value of `$&`.
/e causes the replacement expression to be evaluated as Perl code, using its result as the replacement string.
On the other hand,
s/$search[$i]/$replace[$i]/g
is short for
s/$search[$i]/ "$replace[$i]" /eg
which is equivalent to
s/$search[$i]/ $replace[$i] /eg # Replaces with the value of `$replace[$i]`.

Remove certain characters from a regex group

I have a string that looks like this (key":["value","value","value"])
"emailDomains":["google.co.uk","google.com","google.com","google.com","google.co.uk"]
and I use the following regex to select from the string. (the regex is setup in a way where it wont select a string that looks like this "key":[{"key":"value","key":"value"}] )
(?<=:\[").*?(?="])
Resulting Selection:
google.co.uk","google.com","google.com","google.com","google.co.uk
I want to remove the " in that select string, and i was wondering if there was an easy way to do this using the replace command. Desired result...
"emailDomains":["google.co.uk, google.com, google.com, google.com, google.co.uk"]
How do I solve this problem?
If your string indeed has the form "key":["v1", "v2", ... "vN"], you can split off the part that needs to be changed, replace "," by a space in it, and re-assemble:
my #parts = split / (\["\s* | \s*\"]) /x, $string; #"
$parts[2] =~ s/",\s*"/ /g;
my $processed = join '', #parts;
The regex pattern for the separator in split is captured since in that case the separators are also in the returned list, what is helpful here for putting the string back together. Then, we need to change the third element of the array.
In this approach, we have to change a specific element in the array so if your format varies, even a little, this may not (or still may) be suitable.
This should of course be processed as JSON, using a module. If the format isn't sure, as indicated in a comment, it would be best to try to ensure that you have JSON. Picking bits and pieces like above (or below) is a road to madness once requirements slowly start evolving.
The same approach can be used in a regex, and this may in fact have an advantage to be able to scoop up and ignore everything preceding the : (with split that part may end up with multiple elements if the format isn't exactly as shown, what then affects everything)
$string =~ s{ :\["\s*\K (.*?) ( "\] ) }{
my $e = $2;
my $n = $1 =~ s/",\s*"/ /gr;
$n.$e
}ex;
Here /e modifier makes it so that the replacement side is evaluated as code, where we do the same as with the split above. Notes on regex
Have to save away $2 first, since it gets reset in the next regex
The /r modifier†, which doesn't change its target but rather returns the changed string, is what allows us to use substitution operator on the read-only $1
If nothing gets captured for $2, and perhaps for $1, that means that there was no match and the outcome is simply that $string doesn't change, quietly. So if this substitution should always work then you may want to add handling of such unexpected data
Don't need a $n above, but can return ($1 =~ s/",\s*"/ /gr) . $e
Or, using lookarounds as attempted
$string =~ s{ (?<=:\[") (.+?) (?="\]) }{ $1 =~ s/",\s*"/ /gr }egx;
what does reduce the amount of code, but may be trickier to work with later.
While this is a direct answer to the question I think it's least maintainable.
†  This useful modifier, for "non-destructive substitution," appeared in v5.14. In earlier Perl versions we would copy the string and run regex on that, with an idiom
(my $n = $1) =~ s/",\s*"/ /g;
In the lookarounds-example we then need a little more
$string =~ s{...}{ (my $n = $1) =~ s/",\s*"/ /g; $n }gr
since s/ operator returns the number of substitutions made while we need $n to be returned from that whole piece of code in {} (the replacement side), to be used as the replacement.
You can use this \G based regex to start the match with :[" and further captures the values appropriately and replaces matched text so that only comma is retained and doublequotes are removed.
(:\[")|(?!^)\G([^"]+)"(,)"
Regex Demo
Your text is almost proper JSON, so it's really easy to go the final inch and make it so, and then process that:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say postderef/;
no warnings qw/experimental::postderef/;
use JSON::XS; # Install through your OS package manager or a CPAN client
my $str = q/"emailDomains":["google.co.uk","google.com","google.com","google.com","google.co.uk"]/;
my $json = JSON::XS->new();
my $obj = $json->decode("{$str}");
my $fixed = $json->ascii->encode({emailDomains =>
join(', ', $obj->{'emailDomains'}->#*)});
$fixed =~ s/^\{|\}$//g;
say $fixed;
Try Regex: " *, *"
Replace with: ,
Demo

How to replace consecutive and identical characters in Perl?

I have a string like
XXXXYYYYZZZYYZZZYYYY which needs to be converted to
XXXXAAAYZZZAYZZZAAAY
$s =~ s/Y{2}+/AY/g;
this has 2 problems, {2}+ will get YYYY to AYAY; and AY is not the same length as YYYY (expecting AAAY)
How to get this done in perl?
Use a "look-ahead":
$s =~ s/Y(?=Y+)/A/g;
(?=Y+) means "followed by one or more Y characters", so any Y character that is followed by another Y character will be replaced with an A.
More info from perlretut
There's always more than one way to do it. My suggestion is to grab all the Ys except the last one, and then use that to create a string of As of the same length. The e modifier tells perl to execute the code in the replacement side instead of using it directly, and the r modifier tells =~ to return the result of the substitution instead of modifying the input text directly (useful for these one-liner tests, among other places).
$ perl -E 'say shift =~ s/(Y+)(?=Y)/"A"x length$1/gre' XXXXYYYYZZZYYZZZYYYY
XXXXAAAYZZZAYZZZAAAY
$s =~ s/Y{2}+/AY/g
RHS Pattern is ambiguously obscure pattern: Y{2}+, that's very rarely used regex pattern except if {}+ very rarely is available in few advanced regex engine, including perl maybe, as a regex feature called 'atomic grouping'.
You might have meant (Y{2})+ which is (YY)+ or Y{2,} which is YY+
in perl it's no brainer simple and easy as it supports lookaround feature
perl -e '$s=XXXXYYYYZZZYYZZZYYYY ;$s =~ s/Y(?=Y)/A/g;print $s'
actually lower regex engine such sed still can do it albeit in cumbersome, uneasy way
echo XXXXYYYYZZZYYZZZYYYY |sed -E 's/YY+/&\n/g;s/Y/A/g;s/A\n/Y/g'

Swapping letters with regexp

How can I swap the letter o with the letter e and e with o?
I just tried this but I don't think this is a good way of doing this. Is there a better way?
my $str = 'Absolute force';
$str =~ s/e/___eee___/g;
$str =~ s/o/e/g;
$str =~ s/___eee___/o/g;
Output: Abseluto ferco
Use the transliteration operator:
$str =~ y/oe/eo/;
E.g.
$ echo "Absolute force" | perl -pe 'y/oe/eo/'
Abseluto ferco
As has already been said, the way to do this is the transliteration operator
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.
However, I want to commend you on your creative use of regular expressions. Your solution works, although the placeholder string _ee_ would've been sufficient.
tr is only going to help you for character replacements though, so I'd like to quickly teach you how to utilize regular expressions for a more complicated mass replacement. Basically, you just use the /e tag to execute code in the RHS. The following will also do the replacement you were aiming for:
my $str = 'Absolute force';
$str =~ s/([eo])/$1 eq 'e' ? 'o' : 'e'/eg;
print $str;
Outputs:
Abseluto ferco
Note how the LHS (left hand side) matches both o and e, and them the RHS (right hand side) does a test to see which matched and returns the opposite for replacement.
Now, it's common to have a list of words that you want to replace, so it's convenient to just build a hash of your from/to values and then dynamically build the regular expression. The following does that:
my $str = 'Hello, foo. How about baz? Never forget bar.';
my %words = (
foo => 'bar',
bar => 'baz',
baz => 'foo',
);
my $wordlist_re = '(?:' . join('|', map quotemeta, keys %words) . ')';
$str =~ s/\b($wordlist_re)\b/$words{$1}/eg;
Outputs:
Hello, bar. How about foo? Never forget baz.
This above could've worked for your e and o case, as well, but would've been overkill. Note how I use quotemeta to escape the keys in case they contained a regular expression special character. I also intentionally used a non-capturing group around them in $wordlist_re so that variable could be dropped into any regex and behave as desired. I then put the capturing group inside the s/// because it's important to be able to see what's being captured in a regex without having to backtrack to the value of an interpolated variable.
The tr/// operator is best. However, if you wanted to use the s/// operator (to handle more than just single letter substitutions), you could write
$ echo 'Absolute force' | perl -pe 's/(e)|o/$1 ? "o" : "e"/eg'
Abseluto ferco
The capturing parentheses avoid the redundant $1 eq 'e' test in #Miller's answer.
from man sed:
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
and tr command can do this too:
$ echo "Absolute force" | tr 'oe' 'eo'
Abseluto ferco

Powershell - Replacing a string with a variable ending with a dollar sign

I'm a bit lost with this one. For whatever reason the replace function in powershell doesn't play well with variables ending with a $ sign.
Command:
$var='A#$A#$'
$line=('$var='+"'"+"'")
$line -replace '^.+$',('$line='+"'"+$var+"'")
Expected output:
$line='A#$A#$'
Actual output:
$line='A#$A#
It looks like you're getting hit with a regex substitution that you don't want. The regex special variable $' represents everything after your match. Since your regex matches the entire string, $' is effectively empty. During the replace operation, the .Net regex engine sees $' in your expected output and substitutes in that empty string.
One way to avoid this is to replace all instances of $ in your $var string with $$:
$line -replace '^.+$',('$line='+"'"+($var.Replace('$','$$'))+"'")
You can see more information about regex substitution in .Net here:
Substitutions
I was able to find a band-aid of sorts by replacing $ with a special character and then reverting it back after the change. Preferably you would choose a character that doesn't have a key on your keyboard. For me I chose "¤".
$var='A#$A#$'
$var=$var -replace '\$','¤'
$line=("`$var=''")
$line -replace '^.+$',("`$line='$var'") -replace '¤','$'
I don't really understand the purpose of your posted lines, it seems to me that it would just make more sense to do $line='$line='''+$var+"'", BUT if you insist on your way, just do two replace calls, like this:
$line -replace '^.+$',('$line=''LOL''') -replace 'LOL',$var