Regex is possible to match? - regex

I have files with these filename:
ZATR0008_2018.pdf
ZATR0018_2018.pdf
ZATR0218_2018.pdf
Where the 4 digits after ZATR is the issue number of magazine.
With this regex:
([1-9][0-9]*)(?=_\d)
I can extract 8, 18 or 218 but I would like to keep minimum 2 digits and max 3 digits so the result should be 08, 18 and 218.
How is possible to do that?

You may use
0*(\d{2,3})_\d
and grab Group 1 value. See the regex demo.
Details
0* - zero or more 0 chars
(\d{2,3}) - Group 1: two or three digits
_\d - a _ followed with a digit.
Here is a PCRE variation that grabs the value you need into a whole match:
0*\K\d{2,3}(?=_\d)
See another regex demo
Here, \K makes the regex engine omit the text matched so far (zeros) and then matches 2 to 3 digits that are followed with _ and a digit.

(?:[1-9][0-9]?)?[0-9]{2}(?=_[0-9])
or perhaps:
(?:[1-9][0-9]+|[0-9]{2})(?=_[0-9])
(https://www.freeformatter.com/regex-tester.html, which claims to use the XRegExp library, that you mention in another answer doesn't seem to backtrack into the (?:)? in my first suggestion where necessary, which makes it very different from any regex engine I've encoutered before and makes it prefer to match just the 18 of 218 even though it starts later in the string. But it does work with my second suggestion.

([1-9]\d{2,3})(?=_\d)
{x,y} will match from x to y times the previous pattern, in this case \d
Edit: from your own regex it looked as you wanted the part of the number which starts with a non-zero. However since your examples include leading 0s, maybe you really wanted :
(\d{2,3})(?=_\d)
Which will give you the last 3 digits before underscore unless there are only 2 digits.

I propose you:
^ZATR0*(\d{2,3})_\d+\.pdf$
demo code here. Result:
Match 1 Full match 0-17 ZATR0008_2018.pdf Group 1. 6-8 08
Match 2 Full match 18-35 ZATR0018_2018.pdf Group 1. 24-26 18
Match 3 Full match 36-53 ZATR0218_2018.pdf Group 1. 41-44 218

Related

Negate a character group to replace all other characters

I have the following string:
"Thu Dec 31 22:00:00 UYST 2009"
I want to replace everything except for the hours and minutes so I get the following result:
"22:00"
I am using this regex :
(^([0-9][0-9]:[0-9][0-9]))
But its not matching anything.
This would be my line of actual code :
println("Thu Dec 31 22:00:00 UYST 2009".replace("(^([0-9][0-9]:[0-9][0-9]))".toRegex(),""))
Can someone help me to correct the regex?
The reason the one you have isn't working is because you are asserting that the line starts right before the minutes and seconds, which isn't the case. This can be fixed by removing the assertion (^).
If you need the assertion to remain, there is another way. In most languages, you wouldn't be able to use a variable-length positive lookbehind here, but lucky for you, it looks like you can in Kotlin.
A positive lookbehind is basically just telling the pattern "this comes before what I'm looking for". It's denoted by a group beginning with ?<=. In this case, you can use something like (?<=^[\w ]+). This will match all word characters or spaces between the beginning of the line and where the pattern that comes after it is able to match. Appending it to your expression would look something like (?<=^[\w ]+)([0-9][0-9]:[0-9][0-9]) (note you will have to escape the \w in order for it to be in a string and not be angry about it).
Side note, Yogesh_D is correct in saying that \d\d:\d\d is the same as your [0-9][0-9]:[0-9][0-9]. Using this, it would look more like (?<=^[\w ]+)\d\d:\d\d.
You may use various solutions, here are two:
val text = """Thu Dec 31 22:00:00 UYST 2009"""
val match = """\b(?:0?[1-9]|1\d|2[0-3]):[0-5]\d\b""".toRegex().find(text)
println(match?.value)
val match2 = """\b(\d{1,2}:\d{2}):\d{2}\b""".toRegex().find(text)
println(match2?.groupValues?.getOrNull(1))
Both return 22:00. See regex #1 demo and regex #2 demo.
The regex complexity should be selected based on how messy the input string is.
Details
\b - a word boundary
(?:0?[1-9]|1\d|2[0-3]) - an optional zero and then a non-zero digit, or 1 and any digit, or 2 and a digit from 0 to 3
: - a : char
[0-5]\d - 0, 1, 2, 3, 4 or 5 and then any one digit
\b - a word boundary.
If there is a match with this regex, you get it as a whole match, so you can access it via match?.value.
If you do not have to worry about any pre-valiation when matching, you may simply match 3 colon-separated digit pairs and capture the first two, see the second regex:
\b - a word boundary
(\d{1,2}:\d{2}) - Group 1: one or two digits, : and two digits
:\d{2} - a : and two digits (not captured)
\b - a word boundary.
If there is a match, we need Group 1 value, hence match2?.groupValues?.getOrNull(1) is used.
I am not sure what language you are using but why use negation when you can directly match the first digits in the hh:mm format.
Assuming that the date string format always is in the format with a hh:mm in there.
This regex snippet should have the first group match the hh:mm.
https://regex101.com/r/aHdehZ/1
The regex to use is (\d\d:\d\d)

RegEx to check 24 hours time format fails

I have the following RegEx that is supposed to do 24 hours time format validation, which I'm trying out in https://rubular.com
/^[0-23]{2}:[0-59]{2}:[0-59]{2}$/
But the following times fails to match even if they look correct
02:06:00
04:05:00
Why this is so?
In character classes, you're supposed to denote the range of characters allowed (in contrast to the numbers you want to match in your example). For minutes and seconds, this is relatively straight-forward - the following expression
[0-5][0-9]
...will match any numerical string from "00" to "59".
But for the hours, you need to two separate expressions:
[01][0-9]|2[0-3]
...one to match "00" to "19" and one to match "20" to "23". Due to the alternative used (| character), these need to be grouped, which adds another bit of syntax (?:...). Finally we're just adding the anchors ^ and $ for beginning and end of string, which you already had where they belong.
^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$
You can check this solution out at regex101, if you like.
Your problem is that you understand characters ranges wrong: 0-23 doesn't mean "match any number from 0 to 23", it means: 0-2- match one digit: 0,1 or 2, then match 3.
Try this pattern: (?:[01][0-9]|2[0-3])(?::[0-5][0-9]){2}
Explanation:
(?:...) - non-capturing group
[01][0-9]|2[0-3] - alternation: match whether 0 or one followed by any digits fro 0 to 9 OR 2 followed by 0, 1, 2 or 3 (number from 00-23)
(?::[0-5][0-9]){2} - match : and [0-5][0-9] (basically number from 00-59) twice
Demo
use this (([0-1]\d|[2][0-3])):(([0-5][0-9])):(([0-5][0-9]))
Online demo

Having issue identify a number pattern

I am new to RegEx and I am having some difficult time when trying to detect a pattern.
I want to identify a number that is between 4000-4999 but at the same time must NOT be preceded or followed by another number with an optional character of either space or hyphen "-".
For example:
4567 (match)
I have 4999 roses (match)
1234567 days are gone (no match)
My water supply account is 123 4567 89 (no match)
Howdy, my cell number is 123-4567-89 (no match)
I tried below pattern
(?<!(\d))\b4\d{3}\b(?!(\d))
but it still gives me a match for 123 4567 - I guess there is something special about \b?
Any advice will be highly appreciated.
Thanks,
Eric
You may use
(?<!\d[\s-]|\d)4\d{3}(?![\s-]?\d)
In .NET, JavaScript ECMAScript 2018 compliant environments, or PyPi regex, where lookbehinds patterns can contain ?, *, + and {min,} quantifiers, you may shorten it to
(?<!\d[\s-]?)4\d{3}(?![\s-]?\d)
Or, in case alternation with different length is not supported (as in Boost or Python), use
(?<!\d[\s-])(?<!\d)4\d{3}(?![\s-]?\d)
See the regex demo and regex demo 2 (and a .NET regex demo).
Details
(?<!\d[\s-]|\d) / (?<!\d[\s-]?) / (?<!\d[\s-])(?<!\d) - no digit and a whitespace/- and no digit immediately to the left of the current position is allowed
4\d{3} - 4 and any 3 digits
(?![\s-]?\d) - immediately to the right, no 1 or 0 occurrences of a whitespace/- followed with a digit is allowed.
NOTE The solutions above do not rely on word boundaries and may even match in between underscores and when glued to words. If you really want to avoid that, then you need to use word boundaries by all means, e.g. (?<!\d[\s-]|\d)\b4\d{3}\b(?![\s-]?\d).
How about using Positive Lookahead and Positive Lookbehind along with [^ ]? I think it can get you the desired results.
Pattern:
(?<=^|[^\d]{2})4[0-9]{3}(?=$|[^\d]{2})
Example: https://regex101.com/r/PYPeCk/2/

regex to match one and only one digit

I need to match a single digit, 1 through 9. For example, 3 should match but 34 should not.
I have tried:
\d
\d{1}
[1-9]
[1-9]{1}
[1-9]?
They all match 3 and 34. I am using regex for this because it is part of a much larger expression in which I am using alternation.
The problem with all of your examples, of course, is that they match the digit, but don't keep themselves from matching multiple digits next to each other.
In the following example:
Some text with a 3 and a 34 and what about b5 and 64b?
This regex will match only the lone 3. It uses word boundaries, a handy feature.
\b[1-9]\b
It gets more complicated if you want to match single digits inside words, like the 5 in my example, but you didn't specify if you'd want that, so I'll leave that out for now.

Regex for Discover credit card

I have read through this question, but for Discover card, the starting digits are 6011, 622126-622925, 644-649, 65 instead of just 6011, 65. (Source)
For Discover cards, I picked up this regex from that question ^6(?:011|5[0-9]{2})[0-9]{12}$
I modified it to cover 6011, 644-649& 65 but for 622126-622925, building regex is hard cuz of my poor regex skills.
I have this regex so far 6(?:011|5[0-9]{2}|[4][4-9][0-9]|[2]{2}[1-9])[0-9]{2}$, but it only checks for 622[1-9]**.
How do I modify it so that it accepts only between 622126-622925 for 622*** case?
Here's your regex (demo):
^6(?:011\d{12}|5\d{14}|4[4-9]\d{13}|22(?:1(?:2[6-9]|[3-9]\d)|[2-8]\d{2}|9(?:[01]\d|2[0-5]))\d{10})$
Needless to say, I won't exactly call this pretty or easy to maintain. I would recommend parsing the number as an integer and using your programming language to do the checks.
You should also use Luhn algorithm to check if the credit card number is valid, and while you could theoretically do this with regex, it would many times worse than this.
Allow me to show you how I arrived at this monstrosity, step by step. First, here is how you match each of those ranges:
6011 # matches 6011
65 # matches 65
64[4-9] # matches 644-649
622(1(2[6-9]|[3-9]\d)|[2-8]\d{2}|9([01]\d|2[0-5]))
# matches 622126-622925
Now, you want to match the rest of the digits:
6011\d{12} # matches 6011 + 12 digits
65\d{14} # matches 65 + 14 digits
64[4-9]\d{13} # matches 644-649 + 13 digits
622(1(2[6-9]|[3-9]\d)|[2-8]\d{2}|9([01]\d|2[0-5]))\d{10}
# matches 622126-622925 + 10 digits
Now you can combine all four, and add start and end of line anchors:
^( # match start of string and open group
6011\d{12}| # matches 6011 + 12 digits
65\d{14}| # matches 65 + 14 digits
64[4-9]\d{13}| # matches 644-649 + 13 digits
622(1(2[6-9]|[3-9]\d)|[2-8]\d{2}|9([01]\d|2[0-5]))\d{10}
# matches 622126-622925 + 10 digits
)$ # close group and match end of string
The final product above is a slightly compacted version of the previous regex, and I also made groups non-capturing (that's what those ?: are for).
Here are your options:
Hack your way through it and build a really complicated regex. Regexes are not suited for this sort of integer comparison so what you come up with will necessarily be long, uncomplicated and unmaintainable. See Regex for number check below a value and similar SO questions on this topic.
Use integer comparison in your code.
For reference one such said complicated regex would be
62212[6-9]|6221[3-9]|622[1-8]|62291|62292[1-5]
even this ticket is 3 years ago, I encountered the same task and would like to share a regex for 622126-622925 :)
^(622[1-9]\\d(?<!10|11|9[3-9])\\d(?<!12[0-5]|92[6-9])\\d{10})$
which using zero-width negative lookbehind to exclude not expected number