What does s-/-- and s-/\Z-- in perl mean? - regex

I am a beginner in perl and I have a query regarding pattern matching.
I came across a line in perl where it was written
$variable =~ s-/\Z--;
And as the code goes ahead some another variable was assigned
$variable1 =~ s-/--;
Can you please tell me what does these 2 lines do?
I want to know what does s-/\Z-- and s-/-- mean.

$variable =~ s-/\Z--;
- is used as a delimiter here. However, best practice suggests that you either use / or {} as delimiters.
It could be re-written as:
$variable =~ s{/\Z}{}; # remove a / at the end of a string
Consider:
$variable1 =~ s-/--;
Again, it could be re-written as:
$variable1 =~ s{/}{}; # remove the first /

The s/// operator in Perl is a substitution operation, which performs a search-and-replace on a string using a special kind of pattern called a regular expression. You can read more about regular expressions and Perl's pattern matching in the man pages that come with Perl:
man perlretut
man perlre
If you don't have these on your system, try searching Google for the same.
Applying a substitution to a variable is done with the =~ operator. So the following replaces all instances of 'foo' in the variable $var with 'bar'.
$var =~ s/foo/bar/;
All the Perl operators are documented on the 'perlop' man page.
Even though the most common separator character is a slash (hence s///), you can also use any other punctuation character as a separator. So in this case, the author has decided to use the dash (-) as the separator.
Here's the same line of code above using dash as a separator:
$var =~ s-foo-bar-;
In your case, the dash doesn't seem to add any clarity to the code, so it might be best to update it to use the conventional slashes instead.

The s/// search and replace function in perl can be used with different delimeters, which is what is done in this case. They have replaced / with the minus sign -, or dash.
The s-/-- removes the first / from the string.
The s-/\Z-- matches and removes a slash at the end of the line. I think this is better written: s{/$}{}.

$variable1 =~ s-/--;could be written as
$variable =~ s{/}{}xms;
or this
$variable =~ s/ \/ //xms;
It means delete the first / in the string.
Regarding s-/\Z--, it is usually written like this
$variable =~ s{/ \Z}{}xms;
or this
$variable =~ s/ \/ \Z //xms;
It means delete a / if it is at the end of the string (\Z).

Related

How to replace consecutive and identical characters in Perl?

I have a string like
XXXXYYYYZZZYYZZZYYYY which needs to be converted to
XXXXAAAYZZZAYZZZAAAY
$s =~ s/Y{2}+/AY/g;
this has 2 problems, {2}+ will get YYYY to AYAY; and AY is not the same length as YYYY (expecting AAAY)
How to get this done in perl?
Use a "look-ahead":
$s =~ s/Y(?=Y+)/A/g;
(?=Y+) means "followed by one or more Y characters", so any Y character that is followed by another Y character will be replaced with an A.
More info from perlretut
There's always more than one way to do it. My suggestion is to grab all the Ys except the last one, and then use that to create a string of As of the same length. The e modifier tells perl to execute the code in the replacement side instead of using it directly, and the r modifier tells =~ to return the result of the substitution instead of modifying the input text directly (useful for these one-liner tests, among other places).
$ perl -E 'say shift =~ s/(Y+)(?=Y)/"A"x length$1/gre' XXXXYYYYZZZYYZZZYYYY
XXXXAAAYZZZAYZZZAAAY
$s =~ s/Y{2}+/AY/g
RHS Pattern is ambiguously obscure pattern: Y{2}+, that's very rarely used regex pattern except if {}+ very rarely is available in few advanced regex engine, including perl maybe, as a regex feature called 'atomic grouping'.
You might have meant (Y{2})+ which is (YY)+ or Y{2,} which is YY+
in perl it's no brainer simple and easy as it supports lookaround feature
perl -e '$s=XXXXYYYYZZZYYZZZYYYY ;$s =~ s/Y(?=Y)/A/g;print $s'
actually lower regex engine such sed still can do it albeit in cumbersome, uneasy way
echo XXXXYYYYZZZYYZZZYYYY |sed -E 's/YY+/&\n/g;s/Y/A/g;s/A\n/Y/g'

How do I search and replace with a regex and preserve blank spaces in Perl?

I'm using this code:
$text =~ s/\s(\w)/\u$1/g;
But This is an example
Become ThisIsAnExample
Instead of This Is An Example.
How to preserve blank spaces?
Use lookbehind.
$text =~ s/(?<!\S)(\w)/\u$1/g;
Or use the more efficient \K (Perl 5.10+).
$text =~ s/(?:^|\s)\K(\w)/\u$1/g;
Both of the solutions will make sure the first word is capitalized too. If that's not an issue, the second solution can be simplified to the following:
$text =~ s/\s\K(\w)/\u$1/g;
The matching contains the whitespace, the replacement doesn't.
$text =~ s/(\s)(\w)/$1\u$2/g;
Since \s contains different types of whitespace characters, if you want to keep it in your replacement, you need to capture it and put it back.
An alternative is to use word boundaries and full "words".
$text =~ s/\b(\w+)\b/\u$1/g;

How to match a question mark?

I am trying to search and replace a list of URLs in a file and I am having problems if the search URL has a question mark in it. The $file below is just a single tag here, but it is usually an entire file.
my $search = 'http://shorturl.com/detail.cfm?color=blue';
my $replace = 'http://shorturl.com/detaila.aspx?color=red';
my $file = 'HI';
$file =~ s/$search/$replace/gis;
print $file;
If the $search variable has ? in it the substitution does not work. It would work if I were to take off the ?color=blue from the $search variable.
Does anyone know how to make the above substitution work? Backslashing, i.e. \? did not help. Thanks.
Use quotemeta for the regex pattern.
use warnings;
use strict;
my $search = quotemeta 'http://shorturl.com/detail.cfm?color=blue';
my $replace = 'http://shorturl.com/detaila.aspx?color=red';
my $file = 'HI';
$file =~ s/$search/$replace/gis;
print $file;
__END__
HI
When a string is interpolated as a regex, it isn't matched literally, but interpreted as a regex. This is useful to build complex regexes, e.g.
my #animals = qw/ cat dog goldfish /;
my $animal_re = join "|", #animals;
say "The $thing is an animal" if $thing =~ /$animal_re/i;
In the string $animal_re, the | is treated as a regex metacharacter.
Other metacharacters are e.g. ., which matches any non-newline character, or ?, which makes the previous atom optional.
If you want to match the contents of a variable literally, you can enclose it in \Q...\E quotes:
s/\Q$search/$replace/gi
(The /s option just changes the meaning of . from “match any non-newline character” to “match any character”, and is therefore irrelevant here.)
The \Q...\E is syntactic sugar for the quotemeta function, therefore this answer and toolic's answer are exactly equivalent.
Please note that you want to escape more than just the ?. The ? is the only one in your example that messes up what you're expecting, but the . matching can be insidious to find.
The regex /foo.com/ will indeed match the string foo.com, but it will also match foo com and fooXcom and foo!com, because . matches any character. Therefore, the /foo.com/ should be written as /foo\.com/.

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

Why does my regular expression fail with certain substitutions?

I am new to perl and not sure how to achieve the following.
I am reading a file and putting the lines in a variable called $tline. Next, I am trying to replace some character from the $tline.
This substitution fails if $tline has some special characters like (, ?,= etc in it. How to escape the special characters from this variable $tline?
if ($tline ne "") {
$tline =~ s/\//\%;
}
EDIT
Sorry for the confusions. Here is what I am trying to do.
$tline =~ s/"\//"\<\%\=request\.getContextPath\(\)\%\>\//;
This is working for most of the cases. But when the input file has ? in it, it is failing.
How about:
$tline =~ s/\Q$var\E/;
That will cause quotemeta to be applied to contents of $var which is being used as the pattern.
This isn't a valid regex:
$tline =~ s/\//\%;
It gets read like this to perl
$tline =~ s/a/%;
Where a = /
What you wanted to do is replace a forward-slash with a percent sign you probably want
$tline =~ s/\//%/;
Which is better written like this:
$tline =~ s,/,%,;
You probably also want to replace more than just the first forward-slash, so you want the /g flag:
$tline =~ s,/,%,g;
And, this exactly what tr (transliteration) does:
$tline =~ tr,/,%,;
UPDATE I think what you want is a simple quotemeta() which takes your input, and regex-escapes the meta characters
$ perl -e'print quotemeta("</foo?>")'
\<\/foo\?\>
You could place all your special characters between square brackets (called a "character class"). The following will replace all left parentheses, question marks and equal signs in your string with percent signs:
my $tline = 'fo(?=o';
$tline =~ s/[(?=]/%/g;
print "$tline\n";
Prints:
fo%%%o
quotemeta is a good function for getting a exact literal with special characters into a regex. And \Q and \E are good operators for doing the same thing inside the regex.
However, you're search expression is not that complex. In your edit, you're simply looking for a double quote and a slash. In fact, I've quite simplified your expression so it contains not a single backslash. So it's not a problem for quotemeta nor for that matter \Q and \E.
Once pared down, I don't see anything in your revised substitution that would cause a problem with '?' in $tline.
Key to the simplification is that '.', '(', and ')' mean nothing special to the replacement section of your expression, so this is equivalent:
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
Not to mention easier to read. Of course this is even easier:
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
Because in Perl, you can choose the delimiter you wish with the s operator.
But with any of these, this works:
use Test::More tests => 1;
my $tline = '"/?"';
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
ok( $tline =~ /getContextPath/ );
It passes the test. Perhaps you're having a problem with more than one substitution on a line. That can be fixed with:
$tline =~ s|"/|"<%=request.getContextPath()%>/|g;
That g is the global switch on the end, saying make this substitution for as many times as it occurs in the input.
However, since I can see what you are doing, I suggest an even tighter specification of what you want to search:
$tline =~ s~\b(href|link|src)="/~$1="<%=2request.getContextPath()%>/~g;
And when I run this:
use Test::More tests => 2;
my $tline = '"/?"';
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
ok( $tline =~ /getContextPath/ );
$tline = 'src="/?/?/beer"';
ok( $tline =~ s~\b(href|link|src)="/~$1="<%=request.getContextPath()%>/~g
);
I get two successes.
Your true problem is yet unspecified.
Well, one way to do it is to put all the characters you want to replace in square brackets. Like so:
$string =~ s/[,?=\/]//; # This will remove the first ',', '?', '=', or '/' from your string.
If you want to remove all the '?' in a string, for example, use a g on the end of it like so:
$string =~ s/[?]//g;
I'm a little rusty, but I believe that you only need a '\' in front of \ or /, (and of course the other special characters like \n,\t, etc...). Like so:
$string =~ s/[\\]/[\/]/g; # Switch from DOS to Unix delimiters.
$string =~ s/[\n\t]//g; # Remove all newlines and tabs
As others have said, the code you've posted isn't going to work since you forgot the last /. That's another nice reason to keep the "weird" characters in a box.