Perl regex: Substitution of everything but the pattern - regex

In perl, I would like to substitute a negated class character set (everything but the pattern) by nothing, to keep only the expected string. Normally, this approach should work, but in my case it isn't :
$var =~ s/[^PATTERN]//g;
the original string:
$string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>';
wished pattern to get: b74ed855-63c9-4795-b5d5-c79dd413d613
(5 hex number groups split with 4 dashes)
my code:
$pattern2keep = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}";
(should match only : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (5 hex number groups split with 4 dashes) , char length : 8-4-4-4-12 )
The following should substitute everything but the pattern by nothing, but in fact it does not.
$string =~ s/[^$pattern2keep]//g;
What am I doing wrong please? Thanks.

A character class matches a single character equal to any one of the characters in the class. If the class begins with a caret then the class is negated, so it matches any one character that isn't any of the characters in the class
If $pattern2keep is [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} then [^$pattern2keep] will match any character other than -, 0, 1, 2, 4, 8, 9, [, ], a, f, {, or }
You need to capture the substring, like this
use strict;
use warnings 'all';
use feature 'say';
my $string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>';
my $pattern_to_keep = qr/ \p{hex}{8} (?: - \p{hex}{4} ){3} - \p{hex}{12} /x;
my $kept;
$kept = $1 if $string =~ /($pattern_to_keep)/;
say $kept // 'undef';
output
b74ed855-63c9-4795-b5d5-c79dd413d613

Related

How to match same length digits in a string using regex

I want to find a text (for example: stack) in a String that contains digits and chars (for example: s123t123a123c123k). The only rule is that between every character of the search key there should be the same amount of digits, so all of this should match:
search key: stack
Strings:
stack //0 digit between chars of stack
s7t3a9c0k //1 digit between chars of stack
s27t33a49c50k //2 digit between chars of stack
s127t312a229c330k //3 digit between chars of stack and so on for 4,5,6 digits...
If I could match same length digits then I can write something like: s[]*t[]*a[]*c[]*k if the regex for same length digit is [].
How to match same length digits in a string using regex?
In case you want to do that with Perl, you may use
m/s(\d+)t(??{ "\\d{".length($^N)."}" })a(??{ "\\d{".length($^N)."}" })c(??{ "\\d{".length($^N)."}" })k/
Or, if you want to match digits with [0-9]:
m/s([0-9]+)t(??{ "[0-9]{".length($^N)."}" })a(??{ "[0-9]{".length($^N)."}" })c(??{ "[0-9]{".length($^N)."}" })k/
In the Perl code, you will most likely want to build the pattern dynamically. See the full Perl code demo:
#!/usr/bin/perl
use warnings;
use strict;
use re 'eval'; # stackoverflow.com/a/16320570/3832970
my #input = split /\n/, <<"END";
s7t3a9c0k
s27t33a49c50k
s127t312a229c330k
s1t312a22c3300k
END
my $keyword = "stack";
my $pattern = substr($keyword, 0, 1) . '([0-9]+)' . join( '(??{ "[0-9]{".length($^N)."}" })', split("", substr($keyword, 1)) );
#my $pattern = substr($keyword, 0, 1) . '(\d+)' . join( '(??{ "\\\\d{".length($^N)."}" })', split("", substr($keyword, 1)) );
for my $input ( #input ) {
if ($input =~ m/$pattern/) {
print $input . ": PASS!\n";
} else {
print $input . ": FAIL!\n"
}
}
Output:
s7t3a9c0k: PASS!
s27t33a49c50k: PASS!
s127t312a229c330k: PASS!
s1t312a22c3300k: FAIL!
The $pattern is built dynamically: substr($keyword, 0, 1) gets the first char, then ([0-9]+) is added, then join( '(??{ "[0-9]{".length($^N)."}" })', split("", substr($keyword, 1)) adds the following: it inserts (??{ "[0-9]{".length($^N)."}" }) in between each char of the $keyword substring from the second char. The (??{ "[0-9]{".length($^N)."}" }) part acts as a \d{X} pattern where X is the length of the most recent captured substring (it was ([0-9]+)).
The use re 'eval'; is necessary to build the pattern dynamically. As per this answer, it will only affect the regular expressions in the file or in the curlies where it is used.

exactly once from a set of characters perl using regex

how to check exactly one character from a group of characters in perl using regexp.Suppose from (abcde) i want to check if out of all these 5 characters only one has occured which can occur multiple times.I have tried quantifiers but it does not work for a set of characters.
You could use the following regex match:
/
^
[^a-e]*+
(?: a [^bcde]*+
| b [^acde]*+
| c [^abde]*+
| d [^abce]*+
| e [^abcd]*+
)
\z
/x
The following is a simpler pattern that might be less efficient:
/ ^ [^a-e]*+ ([a-e]) (?: \1|[^a-e] )*+ \z /x
A non-regex solution might be simpler.
# Count the number of instances of each letter.
my %chars;
++$chars{$_} for split //;
# Count how many of [a-e] are found.
my $count = 0;
++$count for grep $chars{$_}, qw( a b c d e );
$count == 1
you can use regex to return a list of matches. then you can store the result in an array.
my #arr = "abcdeaa" =~ /a/g; print scalar #arr ."\n";
prints 3
my #arr = "bcde" =~ /a/g; print scalar #arr ."\n";
prints 0
if you use scalar #arr. it will return the length of the array.

How to match string that contain exact 3 time occurrence of special character in perl

I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example
#array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t )
foreach my $string (#array){
if ( $string =~ /^\/{3}/ ){
print " yes, word with 3 / found !\n";
print "$string\n";
}
else {
print " no word contain 3 / found\n";
}
Few macthing i try but none of them work
$string =~ /^\/{3}/;
$string =~ /^(\w+\/\w+\/\w+\/\w+)/;
$string =~ /^(.*\/.*\/.*\/.*)/;
Any other way i can match this type of string and print the string?
Match a / globally and compare the number of matches with 3
if ( ( () = m{/}g ) == 3 ) { say "Matched 3 times" }
where the =()= operator is a play on context, forcing list context on its right side but returning the number of elements of that list when scalar context is provided on its left side.
If you are uncomfortable with such a syntax stretch then assign to an array
if ( ( my #m = m{/}g ) == 3 ) { say "Matched 3 times" }
where the subsequent comparison evaluates it in the scalar context.
You are trying to match three consecutive / and your string doesn't have that.
The pattern you need (with whitespace added) is
^ [^/]* / [^/]* / [^/]* / [^/]* \z
or
^ [^/]* (?: / [^/]* ){3} \z
Your second attempt was close, but using ^ without \z made it so you checked for string starting with your pattern.
Solutions:
say for grep { m{^ [^/]* (?: / [^/]* ){3} \z}x } #array;
or
say for grep { ( () = m{/}g ) == 3 } #array;
or
say for grep { tr{/}{} == 3 } #array;
You need to match
a slash
surrounded by some non-slashes (^(?:[^\/]*)
repeating the match exactly three times
and enclosing the whole triple in start of line and and of line anchors:
$string =~ /^(?:[^\/]*\/[^\/]*){3}$/;
if ( $string =~ /\/.*\/.*\// and $string !~ /\/.*\/.*\/.*\// )

Regex equivalent

What is the regex equivalent of $string=~/[^x]/ if x is replaced by multi-character string say xyz ? i.e string doesn't contain contain xyz
I eventually want to match
$string = 'beginning string xyz remaining string which doesn't contain xyz';
using
$string =~/(<pattern>)xyz(<pattern>)xyz/
so that
$1 = 'beginning string '
$2 = ' remaining string which doesn't contain '
(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.
(Actually, far more than just strings can be used in this fashion. For example, you can use STRING1|STRING2 just as well as for STRING.)
$string =~ /
( (?:(?!xyz).)* )
xyz
( (?:(?!xyz).)* )
xyz
/sx
If that matches, that will always match at position zero, so let's anchor it to prevent needless backtracking on failure.
$string =~ /
^
( (?:(?!xyz).)* )
xyz
( (?:(?!xyz).)* )
xyz
/sx
In your particular case, a non-greedy .* will work. That is:
(.*?)xyz(.*?)xyz
will give you what you're looking for, as shown in http://rubular.com/r/RtaMG6ZvWK
However, as pointed out in the comment from #ikegami below, this is a fragile approach. And it turns out there is a "string" counterpart to the character-based [^...] construct, as shown in #ikegami's answer https://stackoverflow.com/a/20367916/1008891
You can see this in rubular at http://rubular.com/r/zsO1F0nkXu
while (<>) {
if (/(([^x]|x(?!yz))+)xyz(([^x]|x(?!yz))+)xyz/) {
printf("'%s' '%s'\n", $1, $3);
}
}

Perl regex digit

Consider my regex in this code section:
use strict;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}?/) {
print "$chk\n";
}
}
}
The \d{4} will print nothing. The \d{3} will print only 123. But if I change to \d{1,2}? it will print all. I thought, according to all the sources I read so far, that {1,2} mean: one digit but no more than two. So it should have printed only 1 and 2, correct?
What do I need to extract items that contains only one to two digits?
\d{1,2} succeeds if it finds 1 or 2 digits anywhere in the string provided. Additional string content is does not cause the match to fail. If you want to match only when the string contains exactly 1 or 2 digits, do this: ^\d{1,2}$
You should anchor your regular expression for the desired effect. The built-in function grep suits better here since it is a selection from an array that is to be done:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = ( 1, 2, 123 );
print join "\n", grep /^\d{1,2}$/, #list;
It appears to be working perfectly!
Here's a hint: Use the Perl variables $`, $&, and $'. These variables are special regular expression variables that show the part of the string before the match, what was matched, and the post matched string.
Here's a sample program:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Scalar::Util;
my #list = ("1", "2", "123");
foreach my $string (#list) {
if ($string =~ /\d{1,2}?/) {
say qq(We have a match for "string"!);
say qq("$`" "$&" "$'");
}
else {
say "No match makes David Sad";
}
}
The output will be:
We have a match for "1"!
"" "1" ""
We have a match for "2"!
"" "2" ""
We have a match for "123"!
"" "1" "23"
What this does is divide up the string into three sections: The section of the string before the regular expression match, the section of the string that matches the regular expression, and the section of the string after the regular expression match.
In each case, there was no pre-match because the regular expression matches from the start of the string. We also see that \d{1,2}? matches a single digit in each case even through 123 could have matched two digits. Why? Because the question mark on the end of the match specifier tells the regular expression not to be greedy. In this case, we tell the regular expression to match either one or two characters. Fine, it matches on one. Remove the question mark, and the last line would have looked like this:
We have a match for "123"!
"" "12" "3"
If you want to match on one or two digits, but not three or more digits, you'll have to specify the part of your string before and after the one or two digits. Something like this:
/\D\d{1,2}\D/
This would match your string foo12bar, but not foo123bar. But what if the string is 12? In that case, we want to say that either we have the beginning of the string, or a non-digit before our one or two character match, and we either have a non-digit or the end of the string at the end of our one or two character match:
/(\D|^)\d{1,2}(/D|$)/
A quick explanation:
(\D|^): A non-digit or the beginning of the string (The ^ anchor)
d{1,2}: One or two digits
(\D|$): A non-digit or the end of the string (The $ anchor)
Now, this will match 12, but not 123, and it will match foo12 and foo12bar, but not foo123 or foo123bar.
Just looking for a one or two digit number, we can simply specify the anchors:
/^\d{1,2}$/;
Now, that will match 1, 12, but not foo12 or 123.
The main thing is to use the $`, $&, and $' variables in order to help see exactly what your regular expression is matching on and what's before and after your match.
No, because while the regex only matches two digits, $chk still contains 123. If you want to only print the part that is matched, use
if ($chk =~ m/(\d{1,2})/) {
print "$1\n";
}
Note the parentheses and the $1. This causes it to print only that which is in the parentheses.
Also, this code doesn't make much sense:
sub chk {
my #num = split (" ", "#_");
Because #_ already is an array it makes no sense to make it into a string and then split it. Simply do:
sub chk {
foreach my $chk (#_) {
You also do not need to use chomp for data that is not coming from user input, as it is intended to remove the trailing newline. There is no newline in any of this data.
#!/usr/bin/perl
use strict;
my #list = ("1", "2", "123");
&chk(\#list);
sub chk {
foreach my $chk (#{$_[0]}) {
print "$chk\n" if $chk =~ m/^\d{1,2}$/ ;
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}/ && length($chk) <= 2) {
print "$chk\n";
}
}
}