Perl Replace Capture Group With Length of Capture Group - regex

Given :
my $str = "foo95285734776bar";
$str =~ s/([0-9]{2,4})/_????_/g;
What single regex where '????' is the length of $1 can produce output "foo_4__4__3_bar" ?
That is, where "9528" is replaced with "_4_", "5734" with "_4_", and the remaining "776" with "_3_".

You can use the /e modifier to add Perl code into the substitution part that is then evaled.
my $str = "foo95285734776bar";
$str =~ s/([0-9]{2,4})/'_' . length($1) . '_'/ge;
print $str;
Will output
foo_4__4__3_bar
Note that you now need a full Perl expression there. That's why you have to actually quote and concatenate the underscores.
From perlop:
A /e will cause the replacement portion to be treated as a full-fledged Perl expression and evaluated right then and there. It is, however, syntax checked at compile-time. A second e modifier will cause the replacement portion to be evaled before being run as a Perl expression.

Related

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

Perl match regex variable \Q

I'm trying to match a regex in perl. The regex needs to be stored in a variable.
From this question I got \Q to match regex in a variable.
$regex = "\\$[0-9] (\\+|\\*) [0-9]";
$str = "$2 * 2";
if ($str =~ /\Q$regex/) { # regex is: \$[0-9] (\+|\*) [0-9]
print "Expression found :)\n";
} else {
print "Expression not found :(\n";
}
This matches fine in regexpal. It also works fine when I use the regex immediately without first putting it in $regex (i.e. without the \Q). What is the \Q doing to mess up my regex?
The \Q and \E pair can be used to escape all non-word characters within a double-quoted string context. For instance
perl -E 'say "abc[\Q[..]\E]def"'
output
abc[\[\.\.\]]def
I wonder why you think you need it, as it prevents all regex metacharacters from having their special effect. For instance \Q[0-9] will match exactly [0-9] instead of any single decimal digit
I would write your code like this. Note that I have changed double quotes to qr// when defining the pattern to create a compiled regex, and to single quotes when defining the target string to avoid Perl trying to interpolate built-in variable $2 into the string. You must always use strict and use warnings 'all' at the top of every Perl program you write
use strict;
use warnings 'all';
my $regex = qr/\$[0-9] [+*] [0-9]/;
my $str = '$2 * 2';
if ( $str =~ $regex ) {
print "Expression found :)\n";
}
else {
print "Expression not found :(\n";
}
output
Expression found :)

Swapping letters with regexp

How can I swap the letter o with the letter e and e with o?
I just tried this but I don't think this is a good way of doing this. Is there a better way?
my $str = 'Absolute force';
$str =~ s/e/___eee___/g;
$str =~ s/o/e/g;
$str =~ s/___eee___/o/g;
Output: Abseluto ferco
Use the transliteration operator:
$str =~ y/oe/eo/;
E.g.
$ echo "Absolute force" | perl -pe 'y/oe/eo/'
Abseluto ferco
As has already been said, the way to do this is the transliteration operator
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.
However, I want to commend you on your creative use of regular expressions. Your solution works, although the placeholder string _ee_ would've been sufficient.
tr is only going to help you for character replacements though, so I'd like to quickly teach you how to utilize regular expressions for a more complicated mass replacement. Basically, you just use the /e tag to execute code in the RHS. The following will also do the replacement you were aiming for:
my $str = 'Absolute force';
$str =~ s/([eo])/$1 eq 'e' ? 'o' : 'e'/eg;
print $str;
Outputs:
Abseluto ferco
Note how the LHS (left hand side) matches both o and e, and them the RHS (right hand side) does a test to see which matched and returns the opposite for replacement.
Now, it's common to have a list of words that you want to replace, so it's convenient to just build a hash of your from/to values and then dynamically build the regular expression. The following does that:
my $str = 'Hello, foo. How about baz? Never forget bar.';
my %words = (
foo => 'bar',
bar => 'baz',
baz => 'foo',
);
my $wordlist_re = '(?:' . join('|', map quotemeta, keys %words) . ')';
$str =~ s/\b($wordlist_re)\b/$words{$1}/eg;
Outputs:
Hello, bar. How about foo? Never forget baz.
This above could've worked for your e and o case, as well, but would've been overkill. Note how I use quotemeta to escape the keys in case they contained a regular expression special character. I also intentionally used a non-capturing group around them in $wordlist_re so that variable could be dropped into any regex and behave as desired. I then put the capturing group inside the s/// because it's important to be able to see what's being captured in a regex without having to backtrack to the value of an interpolated variable.
The tr/// operator is best. However, if you wanted to use the s/// operator (to handle more than just single letter substitutions), you could write
$ echo 'Absolute force' | perl -pe 's/(e)|o/$1 ? "o" : "e"/eg'
Abseluto ferco
The capturing parentheses avoid the redundant $1 eq 'e' test in #Miller's answer.
from man sed:
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
and tr command can do this too:
$ echo "Absolute force" | tr 'oe' 'eo'
Abseluto ferco

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

What does s-/-- and s-/\Z-- in perl mean?

I am a beginner in perl and I have a query regarding pattern matching.
I came across a line in perl where it was written
$variable =~ s-/\Z--;
And as the code goes ahead some another variable was assigned
$variable1 =~ s-/--;
Can you please tell me what does these 2 lines do?
I want to know what does s-/\Z-- and s-/-- mean.
$variable =~ s-/\Z--;
- is used as a delimiter here. However, best practice suggests that you either use / or {} as delimiters.
It could be re-written as:
$variable =~ s{/\Z}{}; # remove a / at the end of a string
Consider:
$variable1 =~ s-/--;
Again, it could be re-written as:
$variable1 =~ s{/}{}; # remove the first /
The s/// operator in Perl is a substitution operation, which performs a search-and-replace on a string using a special kind of pattern called a regular expression. You can read more about regular expressions and Perl's pattern matching in the man pages that come with Perl:
man perlretut
man perlre
If you don't have these on your system, try searching Google for the same.
Applying a substitution to a variable is done with the =~ operator. So the following replaces all instances of 'foo' in the variable $var with 'bar'.
$var =~ s/foo/bar/;
All the Perl operators are documented on the 'perlop' man page.
Even though the most common separator character is a slash (hence s///), you can also use any other punctuation character as a separator. So in this case, the author has decided to use the dash (-) as the separator.
Here's the same line of code above using dash as a separator:
$var =~ s-foo-bar-;
In your case, the dash doesn't seem to add any clarity to the code, so it might be best to update it to use the conventional slashes instead.
The s/// search and replace function in perl can be used with different delimeters, which is what is done in this case. They have replaced / with the minus sign -, or dash.
The s-/-- removes the first / from the string.
The s-/\Z-- matches and removes a slash at the end of the line. I think this is better written: s{/$}{}.
$variable1 =~ s-/--;could be written as
$variable =~ s{/}{}xms;
or this
$variable =~ s/ \/ //xms;
It means delete the first / in the string.
Regarding s-/\Z--, it is usually written like this
$variable =~ s{/ \Z}{}xms;
or this
$variable =~ s/ \/ \Z //xms;
It means delete a / if it is at the end of the string (\Z).