I'm trying to validate a user inputed video timestamp in the format hours:minutes:seconds using a regex. So I'm assuming the hours component can be arbitrarily long, so the final format is basically hhhh:mm:ss where there can be any number of h. This is what I have so far
(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])
where
(
([0-9]+:)? # hh: optionally with an arbitrary number of h
([0-5][0-9]:) # mm: , with mm from 00 to 59
)? # hh:mm optionally
([0-5][0-9]) # ss , wtih ss from 00 to 59
which I believe is almost there, but doesn't handle cases like 1:31 or just 1. So to account for this if I add the first digit inside the mm and ss blocks as optional,
(([0-9]+:)?([0-5]?[0-9]:))?([0-5]?[0-9])
firstly the last seconds block starts matching values like 111. Also values like 1:1:12 are matched , which I don't want (should be 1:01:12). So how can I modify this so that m:ss and s are valid whereas h:m:ss,m:s and sss are not?
I am new to regular expressions, so apologies in advance if I'm doing something stupid. Any help is appreciated. Thanks.
You can match either 1 or more digits followed by an optional :mm:ss part, or match mm:ss.
To also match 6:12 and not 1:1:12 make only the first digit optional in the second part of the pattern.
^(?:\d+(?::[0-5][0-9]:[0-5][0-9])?|[0-5]?[0-9]:[0-5][0-9])$
^ Start of string
(?: Non capture group
\d+ Match 1+ digits
(?::[0-5][0-9]:[0-5][0-9])? Match an optional :mm:ss part, both in range 00 - 59
| Or
[0-5]?[0-9]:[0-5][0-9] Match m:ss or mm:ss both in range 00-59 where the first m is optional
) Close non capture group
$ End of string
Regex demo
Doesn't adding the positional anchors(^ and $)solve your problem?
^(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])$
Check here: https://regex101.com/r/fRZf2R/1
Related
we have below file formats
60min-->
A20210217.0300-0000-0400-0000_GBM053.xml.gz
15min -->
A20210217.0300-0000-0315-0000_GBM053.xml.gz ,A20210217.0315-0000-0330-0000_GBM053.xml.gz, A20210217.0330-0000-0345-0000_GBM053.xml.gz , A20210217.0345-0000-0400-0000_GBM053.xml.gz
Tried with below regex but not working
!(^A[0-9]{8}.[0-9]{2}[0]{2}-[0-9]{4}-[0-9]{2}[0]{2}-[0-9]{4}_.*.xml(|\.gz)$)
The ! at the start of the pattern matches a ! literally which is not there in the example data. If it was meant as a delimiter, it should also be at the end.
You could make the second part match either 15, 30 or 45 and use an alternation to those values either in the first or in the third part of the hyphened string.
^A\d{8}\.(?:\d\d(?:[14]5|30)(?:-\d{4}){3}|\d{4}-\d{4}-\d\d(?:[14]5|30)-\d{4})_.*\.xml\.gz$
The pattern matches
^ Start of string
A\d{8}\. Match A and 8 digits followed by a .
(?: Non capture group for the alternation to match either
\d\d(?:[14]5|30) Match 2 digits and either 15 or 45 or 30
(?:-\d{4}){3} Match 3 times - and 4 digits
| Or
\d{4}-\d{4}- Match 2 times 4 digits and -
\d\d(?:[14]5|30)-\d{4} Match 2 digits and either 15 or 45 or 30 followed by 4 digits
) Close non capture groups
_.*\.xml\.gz Match _, 0+ times any char except a newline and .xml.gz
$ End of string
Regex demo
https://regex101.com/r/KqB81T/2
^A\d{8}\.(\d{2}(?:[14]5|30)-0000-\d{4}-0000|\d{4}-0000-\d{2}(?:[14]5|30)-0000)_.*\.xml(|\.gz)$
Break down structure:
First two entries are matched: \d{2}(?:[14]5|30)-0000-\d{4}-0000
Last two entries are matched: \d{4}-0000-\d{2}(?:[14]5|30)-0000
Add matches (UNION between the two SET matches): (FIRST_MATCH|SECOND_MATCH). Also make sure you don't have any character/space at the end (between gz and $)
Let me be the first to say: Welcome to SO, Muskan Garg Bansal!
I want to extract the mobile phones from candidates' CVs.
The mobile phone format I want to extract is 69xxxxxxxx.
The mobile phone formats i come across in the CVs are:
69 xxx xxxxx
0030 69xxxxxxxx
+3069xxxxxxxx
69/xxxx/xxxx
The following formula works great but it extracts the first 10 digits detected and not the one that starts with 69.
=IFERROR(REGEXEXTRACT(TO_TEXT(SPLIT(REGEXREPLACE(I252;"\(|\)|\-| "; ""); CHAR(10))); "\d{10}"))
You may use
=IFERROR(REGEXEXTRACT(TO_TEXT(SPLIT(REGEXREPLACE(I252;"[-/() ]+"; ""); CHAR(10))); "(?:\+|00)?(?:30)?(69\d{8})"))
See the regex demo and the Google Sheets screenshot below:
The regex matches
(?:\+|00)? - an optional + or 00
(?:30)? - an optional 30
( - start of the capturing group (only this value will be returned):
69 - 69 value
\d{8} - eight digits
) - end of the group.
You might consider appending \b at the end of the regex to avoid matching the 8 digits in chunks of more than 8 digits.
Note that the separator cleaning regex is [-/() ]+ now, it matches 1 or more -, /, (, ) and spaces.
The solution to your problem is to make use of a regex lookbehind (although I do not know if Google Sheets supports this).
A regex lookbehind matches a pattern, but without including in the result. The syntax for this, with your example, is:
(?<=69)\d{10}
The picture below is taken from https://regex101.com/ (which is a super-useful tool when working with regexps).
Regex lookahead, lookbehind and atomic groups has some more examples of how lookaheads and lookbehinds work.
all you need is:
=ARRAYFORMULA(IFNA(REGEXREPLACE(REGEXEXTRACT(A1:A&""; "69.*"); "\s|/|\D+"; )))
or better:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(REGEXREPLACE(A1:A&""; "\D+"; ); "69.{8}")))
I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)
We're receiving a file from a customer which we need to read and save some values into our ERP-System.
the customer sends us a date in a week format like: 201814 this would mean the 14th week of the year 2018
the customer sends this date never in the same place in the file, so the only way i think I can get this date, is by searching the string in the file by regex.
My Regex should probably consist of the following conditions:
the length of the string is always 6 characters
all characters are nummeric values
the string always starts with 20
the last two values have to be between 01 and 53
what would the perfect regex for this be? there are many other "nummeric-only" values in the file, that's why i need to be so specific
I know I can do the length condition like this {1,6} and I know that [0-9] matches all digits from zero to nine, but I can't see how I can restrict 01 to 53.
Can someone help me with my regex? thanks a lot!
You may try this:
\b20\d{2}(?:0[1-9]|[1-4]\d|5[0-3])(?!\d)
Demo
Explanation:
\b word boundary start of string indiciator
20 literaly must start with 20
\d{2} followed by any two digits
(?: non capturing group starts here
0[1-9] means 01 to 09
or
[1-4]\d means 10 to 49
or
5[0-3] means 50-53
) end of non capturing group
(?!\d) negative lookahead to ensure the entire match is not followed
by a digit. The entire regex is formed such a way that you should not need to measure 6 digits; as if it is not 6 digit then the above conditions won't be met.
Use This: (20)\d{2}([1-4][0-9]|[0][1-9]|[5][1-3])
Demo
I am new to Stackoverflow and I need your help to match payment invoice number. So that user can't input wrong invoice number. It should match the invoice pattern like 612(fixed) 10/20/30/40/50(only one from 5 of them) 001-064(one at a time) 0000(fixed) 01-64(one at a time) 00(fixed) and then 0001-9999(allowed)
If I show you one invoice number it'll be like this one 612 30 005 0000 55 00 1234 without any space like this 61230005000055001234
I can't figure it out how could I do this. please help me if you can.
^612\s?[1-5]0\s?0(?:[0-5]\d|6[0-4])\s?0000\s?(?:[0-5]\d|6[0-4])\s?00\s?\d{4}$
Should do the job for you, assuming that spaces are optional, but in fixed position and only single ones.
^ is an anchor for the beginning of the string
612\s? matches 612 literally, followed by an optional space
[1-5]0\s? matches 1/2/3/4/5 followed by 0 and an optional space
0([0-5]\d|6[0-4])\s? means 0 followed by either 0-5 and any digit or 6
and 0-4, followed by an optional space
0000\s? matches 0000 literally, followed by an optinal space
([0-5]\d|6[0-4])\s? is either 0-5 and any digit or 6 and 0-4, followed by an optional space
00\s? matches 00 literally, followed by an optional space
\d{4} means any 4 digits
$ is an anchor for the end of the string
https://regex101.com/r/iU5jY5/3
612[1-5]00(?:[0-5][0-9]|6[0-4])0000(?:0[0-9]|[1-5][0-9]|6[0-4])00[0-9]{4}
See a demo here.