Unusual substitute without delimiter - regex

I was unable to decipher what this regex does:
$c =~ s^.*/^^g;
I don't have access to the input or the output.
Does anyone know what it does?

The default delimiter for s/// is the slash, but you can use any printable character as an alternative.
So
$c =~ s^.*/^^g
is equivalent to
$c =~ s/.*\///g
Note that using the conventional delimiter requires the slash within the pattern itself to be escaped
Some options are better than others, and in the case where you're just trying to avoid escaping slashes within the pattern I think a pipe character | is better
I wouldn't hope to learn too much from this programmer. As you have experienced, ^ is a poor and confusing choice. Also, the /g modifier is superfluous, and $c is a terrible choice for an identifier
I would write
$c =~ s|.*/||

Here ^ is used as the delimiter.
We may use any printable character as a regex delimiter.
s^.*/^^g;
s/.*\///g;
Both regex are same
A non-standard delimiter is mostly used to avoid the need to escape the delimiter character within a regex pattern. For
$c = "this is a string with / slash";
Now your regex should be
$c =~ s/.*\///
^^
Here you are escaping the slash.
Both regex are same.
We will use whatever regex we want. #simbabque mentioned in comment.
s{foo}{bar}gs # here curly braces are delimiter
s[some][same] # here square bracket are delimeter.
And we will use character also a regex delimiter for our convenient
To avoid escaping we can use other delimiters.

Related

Perl 6 regex not terminated

I have a Perl 6 code where I am doing the following:
if ($line ~~ /^\s*#/) { print "matches\n"; }
I'm getting this error:
===SORRY!===
Regex not terminated.
at line 2
------> <BOL>�<EOL>
Unable to parse regex; couldn't find final '/'
at line 2
------> <BOL>�<EOL>
expecting any of:
infix stopper
This is part of a Perl 5 code:
if ($line =~ /^\s*#/)
It used to work fine to identify lines that have an optional space and a #.
What's causing this error in Perl 6?
In Perl 6, everything from a lone1 # to the end of the line is considered a comment, even inside regexes.
To avoid this, make it a string literal by placing it inside quotes:
if $line ~~ / ^ \s* '#' / { say "matches"; }
(Escaping with \ should also work, but Rakudo seems to have a parsing bug which makes that not work when preceded by a space.
And quoting the character as shown here is the recommended way anyway – Perl 6 specifically introduced quoted strings inside regexes and made spaces insignificant by default, in order to avoid the backslash clutter that many Perl 5 regexes suffer from.)
More generally, all non-alphanumeric characters need to be quoted or escaped inside Perl 6 regexes in order to match them literally.
This is another deliberate non-backwards-compatible change from Perl 5, where this is a bit more complicated.
In Perl 6 there is a simple rule:
alphanumeric --> matches literally only when not escaped.
(When escaped, they either have special meanings, e.g. \s etc., or are forbidden.)
non-alphanumeric --> matches literally only when escaped.
(When not escaped, they either have special meanings, e.g. ., +, #, etc., or are forbidden.)
1 'Lone' meaning not part of a larger token such as a quoted string or the opener of an embedded comment.
A hash # is used as a comment marker in Perl 6 regexes.
Add a backslash \ to escape it like this
if ( $line =~ /^\s*\#/ )

Perl search and replace using variables, string contains dot

I’m using a variable to search and replace a string using Perl.
I want to replace the string 23.0 with 23.0.1, so I tried this:
my $old="23.0";
my $new="23.0.1";
$_ =~ s/$old/$new/g;
The problem is that it also replaced the string 2310, so I tried:
my $old="23\.0"
and also /ee.
But can’t get the correct syntax for it to work. Can someone show me the correct syntax?
There are two things that will help you here:
The quotemeta function - that will escape meta characters. And also the \Q and \E regex flags, that stop regex interpolation.
print quotemeta "21.0";
Or:
my $old="23.0";
my $new="23.0.1";
my $str = "2310";
$str =~ s/\Q$old\E/$new/g;
print $str;
Just use single quotes and escape the dot.
my $old='23\.0';
To complement Sobrique's excellent answer, let me note that the reason your attempt with "23\.0" didn't work is that "23\.0" and "23.0" evaluate to the same string: in a double-quoted string literal, the backslash escape sequence \. simply evaluates to ..
There are several things you could do to avoid this:
If you indeed want to match a fixed string, and don't need or want to include any special regexp metacharacters in it, you can do as Sobrique suggest and use quotemeta or \Q to escape them.
In particular, this is almost always the correct solution if the string to be matched comes from user input. If you do want to allow some limited set of non-literal metacharacters, you can unescape those after running the pattern through quotemeta. For a simple example, here's a quick-and-dirty way to turn a basic glob-like pattern (using the metacharacters ? and * for "any character" and "any string of characters" repectively) into an equivalent regexp:
my $regexp = "^\Q$glob\E\$"; # quote and anchor the pattern
$regexp =~ s/\\\?/./g; # replace "?" (escaped to "\?" by \Q) with "."
$regexp =~ s/\\\*/.*/g; # replace "*" (escaped to "\*" by \Q) with ".*"
Conversely, if you want to have a literal regexp pattern in your code, without immediately matching it against something, you can use the qr// regexp-like quote operator, like this:
my $old = qr/\b23\.0(\.0)?\b/; # match 23.0 or 23.0.0 (but not 123.012!)
my $new = "23.0.1"; # just a literal string
s/$old/$new/g; # replace any string matching $old in $_ with $new
Note that qr// has other effects beyond just allowing you to use regexp syntax in a string literal: it actually pre-compiles the pattern into a special Regexp object, so that it doesn't need to be recompiled every time it's used later. In particular, as a side effect, the string representation of a qr// regexp literal will usually not exactly match the original content, although it will be equivalent as a regexp. For example, say qr/\b23\.0(\.0)?\b/ will, on my Perl version, output (?^u:\b23\.0(\.0)?\b).
You could also just use a normal double-quoted string literal, and double any backslashes in it, but that's (usually) less efficient than using qr//, and also less readable due to leaning toothpick syndrome.
Using a single-quoted string literal would be slightly better, since backslashes in a single-quoted string are only special when followed by another backslash or a single quote. Even so, readability can still suffer if you happen to need to match any literal backslashes in your regexp, not to mention that it's easy to create subtle bugs if you forget to double a backslash in those rare places where it's still needed.

what does s{SOMESTR}{$myvar} mean in perl?

Someone has written something like the following code :
#! /usr/bin/perl
my $myVar = 'somecomplicatedString';
my $someString = 'mySystemvariable=SOMESTR';
if ( $someString =~ /SOMESTR/ ) {
$someSting =~ s{SOMESTR}{$myVar}
}
# $someString now equals 'mySystemvariable=somecomplicatedString'
What is the difference between the s/// operator and the s{}{} operator?
You can use any set of delimiter in Perl match operator - m//, or substitution operator - s///.
Other examples:
s#oldTest#newTest#
s/oldTest/newTest/
s!oldTest!newTest!
s~oldTest~newTest~
s{oldTest}{newTest} # Here we use appropriate opening and closing braces.
m/someText/
m!someText!
/someText/ # You can omit the `m` when `/` is delimiter
!someText! # This is Wrong. You can't omit `m` in other delimiter.
The major advantage you see with varying delimiter is that you can avoid escaping a delimiter in the text, by using a different character as delimiter.
So, using # as delimiter, you don't need to escape / in the string.
From perlop doc:
Under m// operator section:
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace (ASCII) characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?"
is the delimiter, then a match-only-once rule applies, described in m?PATTERN? below.
Under s/// operator section:
Any non-whitespace delimiter may replace the slashes. Add space after the s when using a character allowed in identifiers.
It is the same operator, but using different delimiters which can be used to achieve better readability.
{} are convenient when using /e modifier,
$string =~ s{(\d)}{
# ...
$1 + 1;
}e;

Difference between Perl regular expression delimiters /.../ and #...#

Today I came across two different syntaxes for a Perl regular expression match.
#I have a date string
my $time = '2012-10-29';
#Already familiar "m//":
$t =~ m/^(\d{4}-\d\d-\d\d)$/
#Completely new to me m##.
$t =~ m#^(\d{4}-\d\d-\d\d)#/
Now what is the difference between /expression/ and #expression#?
As everone else said, you can use any delimiter after the m.
/ has one special feature: you can use it by itself, e.g.
$string =~ /regexp/;
is equivalent to:
$string =~ m/regexp/;
Perl allows you to use pretty much any characters to delimit strings, including regexes. This is especially useful if you need to match a pattern that contains a lot of slash characters:
$slashy =~ m/\/\//; #Bad
$slashy =~ m|//|; #Good
According to the documentation, the first of those is an example of "leaning toothpick syndrome".
Most but not all characters behave in the same way when escaping. There is an important exception: m?...? is a special case that only matches a single time between calls to reset().
Another exception: if single quotes are used for the delimiter, no variable interpolation is done. You still have to escape $, though, as it is a special character matching the end of the line.
Nothing except what you have to escape in the regex. You can use any pair of matched characters you like.
$string = "http://example.com/";
$string =~ m!http://!;
$string =~ m#http://!#;
$string =~ m{http://};
$string =~ m/http:\/\//;
After the match or search/replace operator (the m and s, respectively) you can use any character as the delimiter, e.g. the # in your case. This also works with pairs of parenthesis: s{ abc (.*) def }{ DEF $1 ABC }x.
Advantages are that you don't have to escape the / (but the actual delimiter characters, of course). It's often used for clarity, especially when dealing with things like paths or protocols.
There is no difference; the "/" and "#" characters are used as delimiters for the expression. They simply mark the "boundary" of the expression, but are not part of the expression. In theory you can use most non-alphanumeric characters as a delimiter. Here is a link to the PHP manual (It doesn't matter that it is the PHP manual, the Regex syntax is the same, I just like it because it explains well) on Perl compatible regular expression syntax; read the part about delimiters

Slashes and hashes in Perl and metacharacters

Thanks for the previous assistance everyone!. I have a query regarding RegExp in Perl
My issue is..
I know, when matching you can write m// or // or ## (must include m or s if you use this). What is causing me the confusion is a book example on escaping characters I have. I believe most people escape lots of characters, as a sure fire way of the program working without missing a metacharacter something ie: \# when looking to match # say in an email address.
Here's my issue and I know what this script does:
$date= "15/12/99"
$date=~ s#(\d+)/(\d+)/(\d+)#$1/$2/$3#; << why are no forward slashes escaped??
print($date);
Yet the later example I have, shows it rewritten, as (which i also understand and they're escaped)
$date =~ s/()(\d+)\/(\d+)\/(d+)/$2\/$1\/$3; <<<<which is escaping the forward slashes.
I know the slashes or hashes are programmer preference and their use. What I don't understand is why the second example, escapes the slashes, yet the first doesn't - I have tried and they work both ways. No escaping slashes with hashes? What's even MORE confusing is, looking at yet another book example I also have earlier to this one, using hashes again, they too escape the # symbol.
if ($address =~ m#\##) { print("That's an email address"); } or something similar
So what do you escape from what you don't using hashes or slashes? I know you have to escape metacharacters to match them but I'm confused.
When you build a regexp, you define a character as a delimiter for your regexp i.e. doing // or ##.
If you need to use that character inside your regexp, you will need to escape it so that the regexp engine does not see it as the end of the regexp.
If you build your regexp between forward slashes /, you will need to escape the forward slashes contained in your regexp, hence the escaping in your second example.
Of course, the same rule apply with any character you use as a regexp delimiter, not just forward slashes.
The forward slashes are not meta characters in themselves - only the use of them in the second example as expression separators makes them "special".
The format of a substitute expression is:
s<expression separator char><expression to look for><expression separator char><expression to replace with><expression separator char>
In the first example, using a hash as the first character after the =~ s, makes that character the expression separator, so forward slash is not special and does not require any escaping.
in the second example, the expression separator is indeed the forward slash, so it must be escaped within the expressions themselves.
The regex match-operator allows to define a custom non-whitespace-character as seperator.
In your first example the '#' is used as seperator. So in this regex you don't need to escape the '/' because it hase no special meaning. In the second regex, the seperator char isn't changed. So the default '/' is used. Now you have to escape all '/' in your pattern. Otherwise the parser is confused. :)
If you are not use slashes, the recommend practice is to use the curly braces and the /x modifier.
$date=~ s{ (\d+) \/ (\d+) \/ (\d+) }{$1/$2/$3}x;
Escaping the non-alphanumerics is also a standard even if they are not meta-characters. See perldoc -f quotemeta.
There is another depth to this question about escaping forward slashes with the s operator.
With my example the capturing becomes the problem.
$image_name =~ s/((http:\/\/.+\/)\/)/$2/g;
For this to work the typo with the addition of a second forward slash, had to be captured.
Also, trying to work with just the two slashes did not work. The first slash has to be led by more than one character.
Changing "http://world.com/Photos//space_shots/out_of_this_world.jpg"
To: "http://world.com/Photos/space_shots/out_of_this_world.jpg"