Select file by filenamepattern - regex

In my directory I've following filename pattern:
1. Pattern
2018_09_01_Filename.java
or
2. Pattern
kit-deploy-190718_Filename.java
Now I'm trying to select every file in the directory which is matching with the first pattern (the date can be different but it's always year_month_day). But I don't get any further.
I've been thinking that I could split the basename from the file so that I get the first ten characters. Then I'll look if this is matching with a pattern.
my $filename = basename($file);
my $chars = substr($filename, 0 , 10);
if ($chars =~ m/2000_01_01/) {
print "match\n";
}
else {
print "Don't match";
}

You just need a regex that matches your needs, example:
#!/usr/bin/perl
my $filename = "2018_09_01_Filename.java";
if ($filename =~ m/\D?20\d{2}_\d{2}_\d{2}\D?/) {
print "match\n";
}
Explanation
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
20 matches the characters 20 literally (case sensitive)
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Check the demo

Related

RegEx targeted replace with Named Captures

Given
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
I can use
$line -match '^\{(?<type>[a-z]+)(-\[(?<target>(C|F|CF))\])?(\[(?<tab>\d+)\])?\}_(?<string>.*)'
And $matches['tab'] will correctly have a value of 3. However, if I then want to increment that value, without also affecting the [3] in the string section things get more complicated. I can use $tabIndex = $line.indexOf("[$tab]") to get the index of the first occurrence, and I can also use $newLine = ([regex]"\[$tab\]").Replace($line, '[4]', 1) to only replace the first occurrence. But I wonder, is there a way to get at the this more directly? It's not strictly necessary, as I will only ever want to replace things within the initial {}_, which has a very consistent form, so replacing first instance works, just wondering if I am missing out on a more elegant solution, which also might be needed in a different situation.
I would change the regex a bit, because mixing Named captures with Numbered captures is not recommended, so it becomes this:
'^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)'
You could then use it like below to replace the tab value:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$newTabValue = 12345
$line -replace '^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)', "{`${type}-[`${target}][$newTabValue]}_`${string}"
The result of this will be:
{initError-[cf][12345]}_Invalid nodes(s): [3]
Regex details:
^ Assert position at the beginning of the string
\{ Match the character “{” literally
(?<type> Match the regular expression below and capture its match into backreference with name “type”
[a-z] Match a single character in the range between “a” and “z”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
- Match the character “-” literally
\[ Match the character “[” literally
(?<target> Match the regular expression below and capture its match into backreference with name “target”
[CF] Match a single character present in the list “CF”
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
\[ Match the character “[” literally
(?<tab> Match the regular expression below and capture its match into backreference with name “tab”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
\} Match the character “}” literally
_ Match the character “_” literally
(?<string> Match the regular expression below and capture its match into backreference with name “string”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
An alternative way of increasing the first number in the brackets is using the -Split operator to access the number you want to change:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$NewLine = $line -split "(\d+)"
$NewLine[1] = [int]$newLine[1] + 1
-join $NewLine
Output:
{initError-[cf][4]}_Invalid nodes(s): [3]

Check filename with certain start string and end string?

I need to use perl to check a file name is match the format or not.
example: test1_ab_pls_20170418.csv
1. test1_ab_pls_ is fixed, and the file name will start with it.
2. 20170418 is date, those will be numbers
3. .csv is ending string
I've tried regular expression like
$oldfile=~ m/^(test1_ab_pls_)\d(.csv)$/
but it failed. How can I modified it?
You need to add a quantifier to the \d, {8} would match 8 digits in a row only.
$oldfile=~ m/^(test1_ab_pls_)\d{8}(.csv)$/
See Perlre for more details on Regex.
This should do it
\w{4}\d\_\w{2}\_\w{3}\_\d{8}[.csv|.CSV]+
Demo
https://regex101.com/r/JVKZYP/3
\w{4} matches any word character (equal to [a-zA-Z0-9_]) {4} Matches exactly 4 times
\d matches a digit (equal to [0-9])
\_ matches the character _ literally (case sensitive)
\d{8} matches a digit (equal to [0-9]) {8} Matches exactly 8 times
[.csv|.CSV] Match a single character in the list .csv|CSV (case sensitive)
Or Fix yours [test1_ab_pls_]+\d{8}(.csv)
Or another match https://regex101.com/r/cAKUQN/1
\w{4}\d\_\w{2}\_\w{3}\_(20\d{2})(\d{2})(\d{2})[.csv|.CSV]+
For exact date ([2017]{4})([04]{2})([18]{2})
It's not pretty, but this is another way.
if ( index( $oldfile, 'test1_ab_pls_' ) == 0
&& rindex( $oldfile, '.csv' ) == length($oldfile) - 4 )
{ print "It matches!" }
I benchmarked it, and it's faster than Fashim's regex for positive matches, but slower for negative matches.

A regex to find number only but not followed by test in a String

I have a string as "Test me for find this [test$12345] and dont find [test$SS7890]this". I have to find only 12345 and not 7890.
Description
This regex will:
match the digits between [test$ and ]
The regex:
(?<=\[test\$)([0-9]+(?=]))
Live Example
https://regex101.com/r/bY1kH8/3
Explained
(?<=\[test\$) Positive Lookbehind - Assert that the regex below can be matched
\[ matches the character [ literally
test matches the characters test literally (case sensitive)
\$ matches the character $ literally
1st Capturing group ([0-9]+(?=]))
[0-9]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
(?=]) Positive Lookahead - Assert that the regex below can be matched
] matches the character ] literally

Regex (Do not include digit.digit)

This is a sample text i'm running my regex on:
DuraFlexHose Water 1/2" hex 300mm 30.00
I want to include everything and stop at the 30.00
So what I have in mind is something like [^\d*\.\d*]* but that's not working. What is the query that would help me acheive this?
See Demo
/.*(?=\d{2}\.\d{2})/
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=\d{2}\.\d{2}) Positive Lookahead - Assert that the regex below can be matched
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
\. matches the character . literally
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
If you cannot use any CSV parser and are only limited to regex, I'd suggest 2 regexps.
This one can be used to grab every character from the beginning up to the first pattern of optional spaces + digits(s) + . + digit(s):
^([\s\S]*?)\s*\d+\.\d+
See demo
In case the float value is at the end of string, use a $ anchor (the end of string):
^([\s\S]*?)\s*\d+\.\d+$
See another demo
Note that [\s\S] matches any symbol, even a newline.
Regex breakdown:
^ - Start of string
([\s\S]*?) - (Capture group 1) any symbols, 0 or more, as few as possible otherwise, 3 from 30.45 will be captured)
\s* - 0 or more whitespace, as many as possible (so as to trim Group 1)
\d+\.\d+ - 1 or more digits followed by a period followed by 1 or more digits
$ - end of string.
If you plan to match any floats, like -.05, you will need to replace \d+\.\d+ with [+-]?\d*\.?\d+.
Here is how it can be used:
var str = 'DuraFlexHose Water 1/2" hex 300mm 300.00';
var res = str.match(/^([\s\S]*?)\s*\d+\.\d+/);
if (res !== null) {
document.write(res[1]);
}

Regex to match URL end-of-line or "/" character

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:
http://server/xyz/2008-10-08-4
http://server/xyz/2008-10-08-4/
http://server/xyz/2008-10-08-4/123/more
But not match something like this:
http://server/xyz/2008-10-08-4-1
So, I thought my best bet was something like this:
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]
where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?
To match either / or end of content, use (/|\z)
This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).
To put that with an updated version of what you had:
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)
Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )
You've got a couple regexes now which will do what you want, so that's adequately covered.
What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$
1st Capturing Group (.+)
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\d{4}-\d{2}-\d{2})
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
3rd Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
4th Capturing Group (.*)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
In Ruby and Bash, you can use $ inside parentheses.
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)
(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)