Regex to parse 15 mins files only and skip 60 mins files - regex

we have below file formats
60min-->
A20210217.0300-0000-0400-0000_GBM053.xml.gz
15min -->
A20210217.0300-0000-0315-0000_GBM053.xml.gz ,A20210217.0315-0000-0330-0000_GBM053.xml.gz, A20210217.0330-0000-0345-0000_GBM053.xml.gz , A20210217.0345-0000-0400-0000_GBM053.xml.gz
Tried with below regex but not working
!(^A[0-9]{8}.[0-9]{2}[0]{2}-[0-9]{4}-[0-9]{2}[0]{2}-[0-9]{4}_.*.xml(|\.gz)$)

The ! at the start of the pattern matches a ! literally which is not there in the example data. If it was meant as a delimiter, it should also be at the end.
You could make the second part match either 15, 30 or 45 and use an alternation to those values either in the first or in the third part of the hyphened string.
^A\d{8}\.(?:\d\d(?:[14]5|30)(?:-\d{4}){3}|\d{4}-\d{4}-\d\d(?:[14]5|30)-\d{4})_.*\.xml\.gz$
The pattern matches
^ Start of string
A\d{8}\. Match A and 8 digits followed by a .
(?: Non capture group for the alternation to match either
\d\d(?:[14]5|30) Match 2 digits and either 15 or 45 or 30
(?:-\d{4}){3} Match 3 times - and 4 digits
| Or
\d{4}-\d{4}- Match 2 times 4 digits and -
\d\d(?:[14]5|30)-\d{4} Match 2 digits and either 15 or 45 or 30 followed by 4 digits
) Close non capture groups
_.*\.xml\.gz Match _, 0+ times any char except a newline and .xml.gz
$ End of string
Regex demo

https://regex101.com/r/KqB81T/2
^A\d{8}\.(\d{2}(?:[14]5|30)-0000-\d{4}-0000|\d{4}-0000-\d{2}(?:[14]5|30)-0000)_.*\.xml(|\.gz)$
Break down structure:
First two entries are matched: \d{2}(?:[14]5|30)-0000-\d{4}-0000
Last two entries are matched: \d{4}-0000-\d{2}(?:[14]5|30)-0000
Add matches (UNION between the two SET matches): (FIRST_MATCH|SECOND_MATCH). Also make sure you don't have any character/space at the end (between gz and $)
Let me be the first to say: Welcome to SO, Muskan Garg Bansal!

Related

Write regex patterns for matching single digit or double digit where tens place value is 2 or 4

Below is my regex for matching 2 digit where tens place value is 2 or 3 and it is working fine.
^(?=[2,4])\d{1,2}$
As soon as I add the regex for matching single digit in above regex , It started matching single digit and as well all 2 digit number.
^(?=\d|[2,4])\d{1,2}$
I want below sample input to be matched.
0
1
2
3
24
44
48
29
28
Below not to be matched.
99
11
33
55
77
Also It will great help if I would get to know why my regex is not working.
You get a difference in matches as the positive lookahead asserts that there must be to the right what you specify. In there first pattern that is either 2 4 or , and in the second case just a single digit.
You don't have a comma in your example data, so in that case you can match an optional 2 or 4 using just [24]? followed by a digit without any lookarounds.
^[24]?\d$
See a regex demo.
Try this: ^(\d|[2,4]\d)$
Test regex here: https://regex101.com/r/aZo7fK/1
^(\d|[2,4]\d)$
^ matches the start of string
(\d|[2,4]\d) matches either a single digit(0-9) or a two digit number which starts with either 2 or 4
$ matches the end of the string
This matches either a single digit(0-9) number or a two digit number which starts with either 2 or 4.
I suggest
^[2,4]?[0-9]$
pattern; where
^ - anchor, start of the text
[2,4]? - optional 2 or 4 digit for tens
[0-9] - mandatory digit 0..9 for units
$ - anchor, end of the text
Edit: Now, let's have a look at your current patterns; the first is
^(?=[2,4])\d{1,2}$
Here
(?=[2,4]) - look ahead for 2 or 4
\d{1,2} - one or two digits
as we can see 3 doesn't match: look ahead fails to find 2 or 4. As for your second attempt
^(?=\d|[2,4])\d{1,2}$
pattern, where
(?=\d|[2,4]) - look ahead for ANY digit (note, that |[2,4] is redundant)
\d{1,2} - one or two digits
the pattern matches too many texts; technically it matches any one or two digit numbers, e.g. for:
79
we have
(?=\d|[2,4]) - look ahead - succeeds with 7
\d{1,2} - one or two digits - succeeds with 79

Regex to not allow duplicate wild cards

I want to make regex which can pass the following cases:
02:12
10:23
00.23
0.23
.02
:88
Here is what i have tried: ^([0-9:. ])*[.: ]+$
But it allows duplicate" :, ., (space)", and also I'm not able to limit to 1-2 digits on both sides of wildcards. Any help would be great. Thanks
The pattern you tried only matches digits on the left side and matching the duplicates is due to the quantifiers.
If you want to allow 1 or 2 digits on both sides and make the digits on the left optional:
^[0-9]{0,2}[.:][0-9]{1,2}$
^ Start of string
[0-9]{0,2} Match 0, 1 or 2 times a digit 0-9
[.:] Match either . or :
[0-9]{1,2} Match 1 or 2 times a digit 0-9
$ End of string
Regex demo

How would I find values in a file, but only on lines that don't start with #?

I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)

How can I replace this expression in chain regex (notepad++)?

i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.

Regex matching multiple inputs

I am trying to do a smart input field for UK style weight input, e.g. "6 stone and 3 lb" or "6 st 11 pound", capturing the 2 numbers in groups.
For now I got: ([0-9]{1,2}).*?([0-9]{1,2}).*
Problem is it matches "12 stone" in 2 groups, 1 and 2 instead of just 12. Is it possible to make a regex which captures correctly in both cases?
You need to make the first part possessive so it never gets backtracked into.
([0-9]{1,2}+).*?([0-9]{1,2})
Because . matches everythig including numbers.. try this:
/(\d{1,2})\D+(\d{1,2})?/
Something like this?
\b(\d+)\b.*?\b(\d+)\b
Groups 1 and 2 will have your numbers in either case.
Explanation :
"
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 2
\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
"
This works, then look at capture groups 1 and 3:
([0-9]{1,2})[^0-9]+(([0-9]{1,2})?.+)?
The idea is to make a number and text manditory, but make a second number and text optional.
Here is my suggestion for a regex to match both variants you showed:
(?<stone>\d+\s(?:stone|st))(?:\s(and)?\s?)(?<pound>\d+\s(?:pound|lb))
It's a bit vague at the moment, this works:
/([0-9]{1,2})(?:[^0-9]+([0-9]{1,2}).*)?/
for this data:
6 stone and 3 lb
6 st 11 pound
12 stone
12 st and 11lbs
Seeing as everyone is having a go, here's mine:
(\d+)(?:\D+(\d+)?)
It's definitely the concisest so far. This will match one or two groups of digits anywhere:
"12": ("12", null)
"12st": ("12", null)
"12 st": ("12", null)
"12st 34 lb": ("12", "34")
"cabbage 12st 34 lb": ("12", "34")
"12 potato 34 moo": ("12", "34")
The next step would be making it catch the name of the units that were used.
Edit: as pointed out above, we don's know what language you're using, and not all regex functionality is available in all implementations. However as far as I know, \d for digits and \D for non-digits is fairly universal.