Check filename with certain start string and end string? - regex

I need to use perl to check a file name is match the format or not.
example: test1_ab_pls_20170418.csv
1. test1_ab_pls_ is fixed, and the file name will start with it.
2. 20170418 is date, those will be numbers
3. .csv is ending string
I've tried regular expression like
$oldfile=~ m/^(test1_ab_pls_)\d(.csv)$/
but it failed. How can I modified it?

You need to add a quantifier to the \d, {8} would match 8 digits in a row only.
$oldfile=~ m/^(test1_ab_pls_)\d{8}(.csv)$/
See Perlre for more details on Regex.

This should do it
\w{4}\d\_\w{2}\_\w{3}\_\d{8}[.csv|.CSV]+
Demo
https://regex101.com/r/JVKZYP/3
\w{4} matches any word character (equal to [a-zA-Z0-9_]) {4} Matches exactly 4 times
\d matches a digit (equal to [0-9])
\_ matches the character _ literally (case sensitive)
\d{8} matches a digit (equal to [0-9]) {8} Matches exactly 8 times
[.csv|.CSV] Match a single character in the list .csv|CSV (case sensitive)
Or Fix yours [test1_ab_pls_]+\d{8}(.csv)
Or another match https://regex101.com/r/cAKUQN/1
\w{4}\d\_\w{2}\_\w{3}\_(20\d{2})(\d{2})(\d{2})[.csv|.CSV]+
For exact date ([2017]{4})([04]{2})([18]{2})

It's not pretty, but this is another way.
if ( index( $oldfile, 'test1_ab_pls_' ) == 0
&& rindex( $oldfile, '.csv' ) == length($oldfile) - 4 )
{ print "It matches!" }
I benchmarked it, and it's faster than Fashim's regex for positive matches, but slower for negative matches.

Related

Select file by filenamepattern

In my directory I've following filename pattern:
1. Pattern
2018_09_01_Filename.java
or
2. Pattern
kit-deploy-190718_Filename.java
Now I'm trying to select every file in the directory which is matching with the first pattern (the date can be different but it's always year_month_day). But I don't get any further.
I've been thinking that I could split the basename from the file so that I get the first ten characters. Then I'll look if this is matching with a pattern.
my $filename = basename($file);
my $chars = substr($filename, 0 , 10);
if ($chars =~ m/2000_01_01/) {
print "match\n";
}
else {
print "Don't match";
}
You just need a regex that matches your needs, example:
#!/usr/bin/perl
my $filename = "2018_09_01_Filename.java";
if ($filename =~ m/\D?20\d{2}_\d{2}_\d{2}\D?/) {
print "match\n";
}
Explanation
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
20 matches the characters 20 literally (case sensitive)
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
_ matches the character _ literally
\d{2} matches a digit
{2} Quantifier — Matches exactly 2 times
\D? matches any character that's not a digit
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Check the demo

How to exclude comma (,) in regex?

I came to scenario where I only want [0-9 or .] For that I used this regex:
[0-9.]$
This regex accepts 0-9 and . (dot for decimal). But when I write something like this
1,1
It also accepts comma (,). How can I avoid this?
Once you are looking into a way to parse numbers (you said dot is for decimals), maybe you don't want your string to start with dot neither ending with it, and must accept only one dot. If this is your case, try using:
^(\d+\.?\d+|\d)$
where:
\d+ stands for any digit (one or more)
\.? stands for zero or one of literal dot
\d stands for any digit (just one)
You can see it working here
Or maybe you'd like to accept strings starting with a dot, which is normally accepted being 0 as integer part, in this case you can use ^\d*\.?\d+$.
This regex [0-9.]$ consists of a character class that matches a digit or a dot at the end of a line $.
If you only want to match a digit or a dot you could add ^ to assert the position at the start of a line:
^[0-9.]$
If you want to match one or more digits, a dot and one or more digits you could use:
^[0-9]+\.[0-9]+$
This regex may help you:
/[0-9.]+/g
Accepts 0 to 9 digits and dot(.).
Explanation:
Match a single character present in the list below [0-9.]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
. matches the character . literally (case sensitive)
You can test it here

Removing all strings except specific except some in Regex

I have tried to find the solution of this problem, but I still can't get the correct answer. Therefore, I decided to ask you all here for help.
I have some text :
CommentTimestamps:true,showVODCommentTimestamps:false,enableVODStreamingComments:false,enablePinLiveComments:false,enableFacecastAnimatedComments:false,permalink:"1",isViewerTheOwner:false,isLiveAudio:false,mentionsinput:{inputComponent:{__m:"LegacyMentionsInput.react"}},monitorHeight:false,viewoptionstypeobjects:null,viewoptionstypeobjectsorder:null,addcommentautoflip:true,autoplayLiveVODComments:true,disableCSSHiding:true,feedbackMode:"none",instanceid:"u_0_w",lazyFetch:true,numLazyComments:2,pagesize:50,postViewCount:"78,762",shortenTimestamp:true,showaddcomment:true,showshares:true,totalPosts:1,viewCount:"78,762",viewCountReduced:"78K"},{comments:[],pinnedcomments:[],profiles:{},actions:[],commentlists:{comments:{"1":{filtered:{range:{offset:32,length:0},values:[],count:32,clienthasall:false}}},replies:null},featuredcommentlists:{comments:null,replies:null},featuredcommentids:null,servertime:1492916773,feedb.........`
What I want to get is only : postViewCount:"78,762"
I have tried using [^(postViewCount\b.......)] but it is not what I want to get.
This should do it
(postViewCount:\"\d{2}\,\d{3}\")
https://regex101.com/r/9JENH0/1
postViewCount: matches the characters postViewCount: literally (case sensitive)
\" matches the character " literally (case sensitive)
\d{2} matches a digit (equal to [0-9]) {2} Quantifier — Matches exactly 2 times
\, matches the character , literally (case sensitive)
Now if the count is one million or larger then use (postViewCount:"(?:.*?)")
Regex: postViewCount:"[^"]+"
1. postViewCount:" will match postViewCount:"
2. [^"]+ match all till "
Regex demo
try to match -
.*(postViewCount:"[0-9,]*").*
and replace it with catched group that is \1
Regex demo

Regex (Do not include digit.digit)

This is a sample text i'm running my regex on:
DuraFlexHose Water 1/2" hex 300mm 30.00
I want to include everything and stop at the 30.00
So what I have in mind is something like [^\d*\.\d*]* but that's not working. What is the query that would help me acheive this?
See Demo
/.*(?=\d{2}\.\d{2})/
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=\d{2}\.\d{2}) Positive Lookahead - Assert that the regex below can be matched
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
\. matches the character . literally
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
If you cannot use any CSV parser and are only limited to regex, I'd suggest 2 regexps.
This one can be used to grab every character from the beginning up to the first pattern of optional spaces + digits(s) + . + digit(s):
^([\s\S]*?)\s*\d+\.\d+
See demo
In case the float value is at the end of string, use a $ anchor (the end of string):
^([\s\S]*?)\s*\d+\.\d+$
See another demo
Note that [\s\S] matches any symbol, even a newline.
Regex breakdown:
^ - Start of string
([\s\S]*?) - (Capture group 1) any symbols, 0 or more, as few as possible otherwise, 3 from 30.45 will be captured)
\s* - 0 or more whitespace, as many as possible (so as to trim Group 1)
\d+\.\d+ - 1 or more digits followed by a period followed by 1 or more digits
$ - end of string.
If you plan to match any floats, like -.05, you will need to replace \d+\.\d+ with [+-]?\d*\.?\d+.
Here is how it can be used:
var str = 'DuraFlexHose Water 1/2" hex 300mm 300.00';
var res = str.match(/^([\s\S]*?)\s*\d+\.\d+/);
if (res !== null) {
document.write(res[1]);
}

regex: extract text blocks, defined beginning, undefined end

i have text like this:
Date: 01.02.2015 //<-stable format
something
something more
some random more
Date: 02.02.2015
something random
i dont know
so i have many such blocks. Starts with Date... ends with next Date... start.
The text in the lines in the block could be anything, but not Date... format
I need an array at the end, with such blocks:
array[0] = "Date: 01.02.2015
something
something more
some random more"
array[1] = "Date: 02.02.2015
something random
i dont know"
for now i add some unique splitter before Date... than split by the splitter.
Question: is it possible to get such blocks only by regex?
(i use VBA to parse the text, RegExp object)
Instead of split just match using
\bDate:\s\d{1,2}\.\d{1,2}\.\d{4}[\s\S]*?(?=\nDate:|$)
See demo.
https://regex101.com/r/uF4oY4/77
Syntax explanation (from the linked site):
\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
Date: matches the characters Date: literally (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d{1,2} matches a digit (equal to [0-9]) between 1 and 2 times, as many times as possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
\d{1,2} matches a digit (equal to [0-9]) between 1 and 2 times, as many times as possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
\d{4} matches a digit (equal to [0-9]) exactly 4 times
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy) , what specified in previous brackets
?= Positive Lookahead - Assert that the following Regex matches
\nDate Option 1
\n matches a line-feed (newline) character (ASCII 10)
Date matches the characters Date: literally (case sensitive)
$: Option 2 - $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)