Perl regular expression {} quantifier multiple matches - regex

Im trying to parse a file wherein each line has 3 floats(1, +1.0 -1.0 being valid values) and while the regular expression in the snippet matches a float value, I'm not sure how I should be using the Perl quantifier {n} to match multiple floats within a single line.
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, "<", "floatNumbers.txt") or die "Cannot open < floatNumbers.txt";
while(<$fh>)
{
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*)/)
{
print $1."\n";
}
}
Code snippet, I tried to match 3 floats within a line. Could readers help me with the correct usage of the {} quantifier?
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*){3}/)

You're trying to do extraction and validation at the same time. I'd go with:
sub is_float {
return $_[0] =~ /
^
[-+]?
(?: \d+(?:\.[0-9]*)? # 9, 9., 9.9
| \.[0-9]+ # .9
)
\z
/x;
}
while (<$fh>) {
my #fields = split;
if (#fields != 3 || grep { !is_float($_) } #fields) {
warn("Syntax error at line $.\n");
next;
}
print("#fields\n");
}
Note that your validation consdered ., [ and ...0...0... to be numbers. I fixed that.

Quntifiers just allow you to specify how many times you want to match something in a regex.
For example /(ba){3}/ would match ba in a string exactly 3 times :
bababanfnfd = bababa but not
baba = no match.
You can also use (taken from: http://perldoc.perl.org/perlrequick.html):
a? = match 'a' 1 or 0 times
a* = match 'a' 0 or more times, i.e., any number of times
a+ = match 'a' 1 or more times, i.e., at least once
a{n,m} = match at least n times, but not more than m times.
a{n,} = match at least n or more times
a{n} = match exactly n times

This is a generalized pattern that I think does what you are talking about:
# ^\s*(?:[-+]?(?=[^\s\d]*\d)\d*\.?\d*(?:\s+|$)){3}$
^ # BOL
\s* # optional whitespaces
(?: # Grouping start
[-+]? # optional -+
(?= [^\s\d]* \d ) # lookahead for \d
\d* \.? \d* # match this form (everything optional but guaranteed a \d)
(?: \s+ | $ ) # whitespaces or EOL
){3} # Grouping end, do 3 times
$ # EOL

Related

exactly once from a set of characters perl using regex

how to check exactly one character from a group of characters in perl using regexp.Suppose from (abcde) i want to check if out of all these 5 characters only one has occured which can occur multiple times.I have tried quantifiers but it does not work for a set of characters.
You could use the following regex match:
/
^
[^a-e]*+
(?: a [^bcde]*+
| b [^acde]*+
| c [^abde]*+
| d [^abce]*+
| e [^abcd]*+
)
\z
/x
The following is a simpler pattern that might be less efficient:
/ ^ [^a-e]*+ ([a-e]) (?: \1|[^a-e] )*+ \z /x
A non-regex solution might be simpler.
# Count the number of instances of each letter.
my %chars;
++$chars{$_} for split //;
# Count how many of [a-e] are found.
my $count = 0;
++$count for grep $chars{$_}, qw( a b c d e );
$count == 1
you can use regex to return a list of matches. then you can store the result in an array.
my #arr = "abcdeaa" =~ /a/g; print scalar #arr ."\n";
prints 3
my #arr = "bcde" =~ /a/g; print scalar #arr ."\n";
prints 0
if you use scalar #arr. it will return the length of the array.

How to match string that contain exact 3 time occurrence of special character in perl

I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example
#array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t )
foreach my $string (#array){
if ( $string =~ /^\/{3}/ ){
print " yes, word with 3 / found !\n";
print "$string\n";
}
else {
print " no word contain 3 / found\n";
}
Few macthing i try but none of them work
$string =~ /^\/{3}/;
$string =~ /^(\w+\/\w+\/\w+\/\w+)/;
$string =~ /^(.*\/.*\/.*\/.*)/;
Any other way i can match this type of string and print the string?
Match a / globally and compare the number of matches with 3
if ( ( () = m{/}g ) == 3 ) { say "Matched 3 times" }
where the =()= operator is a play on context, forcing list context on its right side but returning the number of elements of that list when scalar context is provided on its left side.
If you are uncomfortable with such a syntax stretch then assign to an array
if ( ( my #m = m{/}g ) == 3 ) { say "Matched 3 times" }
where the subsequent comparison evaluates it in the scalar context.
You are trying to match three consecutive / and your string doesn't have that.
The pattern you need (with whitespace added) is
^ [^/]* / [^/]* / [^/]* / [^/]* \z
or
^ [^/]* (?: / [^/]* ){3} \z
Your second attempt was close, but using ^ without \z made it so you checked for string starting with your pattern.
Solutions:
say for grep { m{^ [^/]* (?: / [^/]* ){3} \z}x } #array;
or
say for grep { ( () = m{/}g ) == 3 } #array;
or
say for grep { tr{/}{} == 3 } #array;
You need to match
a slash
surrounded by some non-slashes (^(?:[^\/]*)
repeating the match exactly three times
and enclosing the whole triple in start of line and and of line anchors:
$string =~ /^(?:[^\/]*\/[^\/]*){3}$/;
if ( $string =~ /\/.*\/.*\// and $string !~ /\/.*\/.*\/.*\// )

Perl regex: Substitution of everything but the pattern

In perl, I would like to substitute a negated class character set (everything but the pattern) by nothing, to keep only the expected string. Normally, this approach should work, but in my case it isn't :
$var =~ s/[^PATTERN]//g;
the original string:
$string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>';
wished pattern to get: b74ed855-63c9-4795-b5d5-c79dd413d613
(5 hex number groups split with 4 dashes)
my code:
$pattern2keep = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}";
(should match only : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (5 hex number groups split with 4 dashes) , char length : 8-4-4-4-12 )
The following should substitute everything but the pattern by nothing, but in fact it does not.
$string =~ s/[^$pattern2keep]//g;
What am I doing wrong please? Thanks.
A character class matches a single character equal to any one of the characters in the class. If the class begins with a caret then the class is negated, so it matches any one character that isn't any of the characters in the class
If $pattern2keep is [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} then [^$pattern2keep] will match any character other than -, 0, 1, 2, 4, 8, 9, [, ], a, f, {, or }
You need to capture the substring, like this
use strict;
use warnings 'all';
use feature 'say';
my $string = '<iframe src="https://foo.bar/embed/b74ed855-63c9-4795-b5d5-c79dd413d613?autoplay=1&context=cGF0aD0yMSwx</iframe>';
my $pattern_to_keep = qr/ \p{hex}{8} (?: - \p{hex}{4} ){3} - \p{hex}{12} /x;
my $kept;
$kept = $1 if $string =~ /($pattern_to_keep)/;
say $kept // 'undef';
output
b74ed855-63c9-4795-b5d5-c79dd413d613

Perl regex negation

How do I negate this regular expression (without using !~)?
my $Line='pqr_abc_def_ghi_xyz';
if ($Line=~/(?:abc|def|ghi)/)
{
printf("abc|def|ghi is not present\n");
}
else
{
printf("abc|def|ghi is present\n");
}
Note: abc,def or ghi could be preceded or succeeded by string
if ( $Line =~ /^(?!.*(?:abc|def|ghi))/s ) {
I.e., it is not possible to match that pattern anywhere after the start of the string.
Another way, this might give you more control of the individual component substrings
# (?s)^(?:(?:(?!abc|def|ghi).)+|)$
(?s)
^
(?:
(?:
(?!
abc
| def
| ghi
)
.
)+
|
)
$
Another option could be to use unless instead of if:
unless ($Line=~/(?:abc|def|ghi)/){printf("abc|def|ghi is not present\n");}
else {printf("abc|def|ghi is present\n");}

Regular expression : validate numbers greater than 0, with or without leading zeros

I need a regular expression that will match strings like T001, T1, T012, T150 ---- T999.
I made it like this : [tT][0-9]?[0-9]?[0-9], but obviously it will also match T0, T00 and T000, which I don't want to.
How can I force the last character to be 1 if the previous one or two are zeros ?
Quite easy using a negative lookahead: ^[tT](?!0{1,3}$)[0-9]{1,3}$
Explanation
^ # match begin of string
[tT] # match t or T
(?! # negative lookahead, check if there is no ...
0{1,3} # match 0, 00 or 000
$ # match end of string
) # end of lookahead
[0-9]{1,3} # match a digit one or three times
$ # match end of string
Online demo
I would not use regexp for that.
<?php
function tValue($str) {
if (intval(substr($str, 1)) !== 0) {
// T value is greater than 0
return $str;
} else {
// convert T<any number of 0> to T<any number-1 of 0>1
return $str[ (strlen($str) - 1) ] = '1';
}
}
// output: T150
echo tValue('T150'), PHP_EOL;
// output: T00001
echo tValue('T00000'), PHP_EOL;
// output: T1
echo tValue('T0'), PHP_EOL;
// output: T555
echo tValue('T555'), PHP_EOL;
Codepad: http://codepad.org/hqZpo8K9