Perl: Why is this pattern matching?

Perl: Why is this pattern matching? - regex

Why is the below pattern match is successful? Am I missing something?
$a="pattern";
if($a =~ /[0-9]*/){
print "Contains\n";
}

The * quantifier matches 0 or more. And the pattern does match exactly zero digits.
You might want to use + which would denote a match 1 or more times.
Quoting from perldoc perlre:
Quantifiers
The following standard quantifiers are recognized:
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

Using * as a quantifier means zero or more instances. In this case it is matching with zero at the position just before the p of the target string.
To match at least one digit use + quantifier instead.

Related

Select file by filenamepattern

In my directory I've following filename pattern:
1. Pattern
2018_09_01_Filename.java
or
2. Pattern
kit-deploy-190718_Filename.java
Now I'm trying to select every file in the directory which is matching with the first pattern (the date can be different but it's always year_month_day). But I don't get any further.
I've been thinking that I could split the basename from the file so that I get the first ten characters. Then I'll look if this is matching with a pattern.
my $filename = basename($file);
my $chars = substr($filename, 0 , 10);
if ($chars =~ m/2000_01_01/) {
print "match\n";
}
else {
print "Don't match";
}

You just need a regex that matches your needs, example:
#!/usr/bin/perl
my $filename = "2018_09_01_Filename.java";
if ($filename =~ m/\D?20\d{2}_\d{2}_\d{2}\D?/) {
print "match\n";
}
Explanation
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
20 matches the characters 20 literally (case sensitive)
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Check the demo

Regex (Do not include digit.digit)

This is a sample text i'm running my regex on:
DuraFlexHose Water 1/2" hex 300mm 30.00
I want to include everything and stop at the 30.00
So what I have in mind is something like [^\d*\.\d*]* but that's not working. What is the query that would help me acheive this?

See Demo
/.*(?=\d{2}\.\d{2})/
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=\d{2}\.\d{2}) Positive Lookahead - Assert that the regex below can be matched
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
\. matches the character . literally
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times

If you cannot use any CSV parser and are only limited to regex, I'd suggest 2 regexps.
This one can be used to grab every character from the beginning up to the first pattern of optional spaces + digits(s) + . + digit(s):
^([\s\S]*?)\s*\d+\.\d+
See demo
In case the float value is at the end of string, use a $ anchor (the end of string):
^([\s\S]*?)\s*\d+\.\d+$
See another demo
Note that [\s\S] matches any symbol, even a newline.
Regex breakdown:
^ - Start of string
([\s\S]*?) - (Capture group 1) any symbols, 0 or more, as few as possible otherwise, 3 from 30.45 will be captured)
\s* - 0 or more whitespace, as many as possible (so as to trim Group 1)
\d+\.\d+ - 1 or more digits followed by a period followed by 1 or more digits
$ - end of string.
If you plan to match any floats, like -.05, you will need to replace \d+\.\d+ with [+-]?\d*\.?\d+.
Here is how it can be used:
var str = 'DuraFlexHose Water 1/2" hex 300mm 300.00';
var res = str.match(/^([\s\S]*?)\s*\d+\.\d+/);
if (res !== null) {
document.write(res[1]);
}

egrep matching patterns containing 3 repeated digits, not consecutive

Use egrep to match lines containing 3 repeated digits, not necessarily consecutive,i.e "3 33", "55 5", "666" or "a6b6c6d". I have an initial thought.
I tried:
egrep '1[^1]*1[^1]*1' test
This will recoginize stuff like 1abd1df31.
However, I try not to enumerate all digit from 0 to 9. So how can I generalize this using back reference?
Thanks ahead!
NOTE that: there these three digit should be identical. ie. 3aa2aa1aa should not match.

This will do it:
/(?=.*?(\d))(?:(?:.*?\1){3})/
DEMO
EXPLANATION:
(?=.*?(\d))(?:(?:.*?\1){3})
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*?(\d))»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference number 1 «(\d)»
Match a single digit 0..9 «\d»
Match the regular expression below «(?:(?:.*?\1){3})»
Match the regular expression below «(?:.*?\1){3}»
Exactly 3 times «{3}»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the same text as most recently matched by capturing group number 1 «\1»

This works for simple cases:
egrep '^[^0-9]*([0-9])[^0-9]*\1[^0-9]*\1[^0-9]*$'
Explanation:
[^0-9]* zero or more non-digits
([0-9]) one digit captured with parens
\1 back-reference to the captured digit
[^0-9] zero or more none digits
^ and $ beginning and end of line
Caveat programmor:
It matches 3 foo 3 bar 3 but fails for 3 4 3 baz 3. In other words, no other digits are allowed in the line, just the 3 you're looking for.
Try this Perl one-liner to match the tricky cases with multiple digit types.
perl -ne '$i=$_;%a=();$a{$_}++for(split//,$i);for(0..9){if($a{$_}==3){print $i;last}}'
For each line $i it creates a hash %a addressed by each character of the line, storing occurrence counts. Then I check for digits with occurrence counts of 3, if any found, line $i is printed.

Why is my regex not matching blocks of numbers?

pretty basic question, so I'll keep it short and sweet.
My current regex is \d* ( (\d){1,6} works, but is messy) - I want to grab all groups of numbers, i.e. 12345, 857.
How do I do it?

\d* matches any number of digits, including 0. Your string starts with 0 digits. Hey, a match!
Use \d+.

You are looking to do either \d+ or \d{1,} to match/capture your groups of digits.
Regular expression quantifiers are as followed:
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
As per stated with grabbing your last group of digits in the following string(s):
google.com/185/586
google.com/389/754
Use a look ahead assertion: (?<=\d\/)(\d+), this will capture (586) and (754)

What does \d+ mean in a regular expression?

What does \d+ mean in a regular expression?

\d is a digit (a character in the range [0-9]), and + means one or more times. Thus, \d+ means match one or more digits.
For example, the string "42" is matched by the pattern \d+.
You can also find explanations for pieces of regular expressions like this using a tool like Regex101 (online, free) or Regex Coach (downloadable for Windows, free) that will let you enter a regular expression and sample text, then indicate what (if anything) matches the regex. They also try to explain, in words, what the regular expression does.

\d is called a character class and will match digits. It is equal to [0-9].
+ matches 1 or more occurrences of the character before.
So \d+ means match 1 or more digits.

\d means 'digit'. + means, '1 or more times'. So \d+ means one or more digit. It will match 12 and 1.

\d is a digit, + is 1 or more, so a sequence of 1 or more digits

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl: Why is this pattern matching? - regex

Why is the below pattern match is successful? Am I missing something? $a="pattern"; if($a =~ /[0-9]*/){ print "Contains\n"; }

Using * as a quantifier means zero or more instances. In this case it is matching with zero at the position just before the p of the target string. To match at least one digit use + quantifier instead.

Related

Select file by filenamepattern

Regex (Do not include digit.digit)

egrep matching patterns containing 3 repeated digits, not consecutive

Why is my regex not matching blocks of numbers?

What does \d+ mean in a regular expression?

Categories

Resources