Regex to match number(s) or UUID - regex

I need regex which loosely matches UUIDs and numbers. I expect my filename to be formatted like:
results_SOMETHING.csv
This something ideally should be numbers (count of how many time a script is run) or a UUID.
This regex is encompasses a huge set of filenames:
^results_?.*.csv$
and this one:
^results_?[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.csv$
matches only UUIDs. I want a regex whose range is somewhere in between. Mostly I don't want matches like result__123.csv.

Note: This doesn't directly answer the OP question, but given the title, it will appear in searches.
Here's a proper regex to match a uuid based on this format without the hex character constraint:
(\w{8}(-\w{4}){3}-\w{12}?)
If you want it to match only hex characters, use:
/([a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}?)/i
(Note the / delimiters used in Javascript and the /i flag to denote case-insensitivity; depending on your language, you may need to write this differently, but you definitely want to handle both lower and upper case letters).
If you're prepending results_ and appending .csv to it, that would look like:
^results_([a-z\d]{8}(-[a-z\d]{4}){3}-[a-z\d]{12}?).csv$

-----EDITED / UPDATED-----
Based on the comments you left, there are some other patterns you want to match (this was not clear to me from the question). This makes it a little more challenging - to summarize my current understanding:
results.csv - match (NEW)
results_1A.csv - match (NEW)
results_ABC.csv - ? no match (I assume)
result__123.csv - no match
results_123.csv - match
Results_123.cvs - ? no match
results_0a0b0c0d-884f-0099-aa95-1234567890ab.csv - match
You will find the following modification works according to the above "specification":
results(?:_(?:[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}|(?=.*[0-9])[A-Z0-9]+))?\.csv
Breaking it down:
results matches characters "results" literally
(?:_ ….)? non-capturing group, repeated zero or one time:
"this is either there, or there is nothing"
[0-9a-f]{8}- exactly 8 characters from the group [0-9a-f]
followed by hyphen "-"
(?:[0-9a-f]{4}-){3} ditto but group of 4, and repeated three times
[0-9a-f]{12} ditto, but group of 12
| OR...
(?=.*[0-9]+) at least one number following this
[A-Z0-9]+ at least one capital letter or number
\.csv the literal string ".csv" (the '.' has to be escaped)
demonstration on regex101.com

Related

Regular Expression Stopping at Specified Value

I have to use a regular expression to parse values out of a swift message and there are some situations where the behaviour is not what I want.
Lets say I am after something with a particular pattern - in this case a BIC (6 letters, followed by 2 letters or digits followed by optional XXX or 3 digits)
([A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
this is fine but now I want to look for these bank codes in particular fields. In swift a field is denoted with : and has some numbers and sometimes a letter.
so I want to match a BIC value in field 52A
I can do the following
(52A:[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
which would match 52A:AAAAAAAAXXX
my problem is you can have things before and after this value - and the value itself might not exist in the field you want
so I can wildcard the reg ex to allow for things before it for example
(52A:.*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
matches 52A:somerubbishAAAAAAAAXXX
but if there isnt something within this field - the reg ex continues to search for the pattern and this is where i have a problem.
for example the above reg ex matches this 52A:somerubbish:57D:AAAAAAAAXXX
Question
I need the reg ex to stop on the first field that is after it (it might not always be 57D but it will always follow the format [0-9]{2}[A-Z]{0,1})
so the above example shouldnt return a match as the pattern I am after is not contained in the 52A section
Does anyone know how I can do this?
Change .*? to [^:]*?:
(52A:[^:]*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
[^:] means "any character except :", which ensures the match doesn't run into the next field.
See live demo.
Also, unless your situation requires you to match your target as group 1, you don't need the outer brackets: the entire match (ie group 0) will be your target.
I suspect instead of [XXX0-9]{0,3} you want (XXX|\d{3})? (XXX or 3 digits, but optionally) or perhaps (XXX|\d{1,3})? (XXX or up to 3 digits, but optionally)
Using [XXX0-9]{0,3} (which is the same as [X0-9]{0,3}) is a character class notation, repeating 0-3 times an X char or a digit.
If the value itself can also contain a colon, you can match any character as "rubbish" as long as what is directly to the right is not the field format.
52A:(?:(?![0-9]{2}[A-Z]?:).)*[A-Z]{6}[A-Z0-9]{2}(?:[0-9]{3}|XXX)?
The pattern matches:
52A: Match literally
(?:(?![0-9]{2}[A-Z]?:).)* Match any character asserting not 2 digits, optional char A-Z and : directly to the right
[A-Z]{6}[A-Z0-9]{2} Match 6 chars A-Z and 2 chars A-Z or 0-9
(?:[0-9]{3}|XXX)? Optionally match 3 digits or XXX
See a regex demo.

PCRE regex capture group can be numbers or letters but can't be just numbers

I have seen a host of questions very similar to this but they're not able to quite collect what I'm looking for.
I have a search system for finding PDO named placeholders in an SQL string.
PDO placeholders can be A-z, 0-9, or _ and always begin with :. However, in some circumstances date and time values also appear which naturally use : (12:35).
I need to check to find placeholders that match the PDO criteria but which are not just numeric.
Can I do this in a single Regex?
The regex I have developed at the moment is:
/:(?:[A-Z_]*)(?=[0-9]*)/gmi
But this cuts off when any digit is found, see the below example SQL:
SELECT name, horse, id, DATE_FORMAT(Nee.datetimed, '12:12:12 # %D') as del_time
FROM members WHERE biztype LIKE CONCAT('%',:bizb,'%')
AND (locate LIKE '%hos%' OR locate LIKE '%all%')
AND bizcat LIKE CONCAT('%',:catb7,'%') ORDER BY `status` DESC, RAND()
I need to catch :catb7 and :bizb but ignore the time values.
My Regex above catches :bizb and catb but that catch is incorrect as it chops off the 7.
/:(?:[A-Z_]*(?:[0-9]*))/gmi
Catches :12 and :12 which is incorrect.
/:(?:[A-Z_]*)(?=[0-9]*)/gmi
Catches : as well which is incorrect.
Various tweaks and changes to the capture groups can't seem to find the correct result: Looking for:
:<any letter or number or underscore, any length, must contain at least one letter or underscore>
Valid catches:
:adbcd
:5fedg
:56_gt
:der
:9_6
INVALID catches:
:12
:1
:%D [MySQL date formatting]
You can use
\B:(?!\d+\b)\w+
See the regex demo.
Details:
\B - a non-word boundary position (start of string or a non-word char must appear immediately to the left of the current location)
: - a colon
(?!\d+\b) - a negative lookahead that fails the match if there are one or more digits followed with a word boundary immediately to the right of the current location
\w+ - one or more word chars (letters/digit/underscores)

Regex Match string having exact number of a char

I Have some strings:
1:2:3:4:5
2:3:4:5
5:3:2
6:7:8:9:0
How to find strings that have exact numbers of colons?
Example I need to find strings where 4 colons.
Result:
1:2:3:4:5 and 6:7:8:9:0
Edit:
No matter what text between colons, it may so:
qwe:::qwe:
:998:qwe:3ee3:00
I have to specify a number of colons, but using regexp_matches.
It something like filter to search broken strings.
Thanks.
With N being the number you search for:
"^([^:]*:){N}[^:]*$"
Here is a test:
for s in ":::foo:bar" "foo:bar:::" "fo:o:ba::r" ; do echo "$s" | egrep "^([^:]*:){4}[^:]*$" ; done
Change 4 to 3 and 5, to see it not matching.
Maybe postgresql needs specific flags or masking for some elements.
"^([^:]*:){N}[^:]*$"
"^ $"
# matching the whole String/Line, not just a part
"([^:]*:){N}[^:]*"
"( ){N}[^:]*"
# N repetitions of something, followed by an arbitrary number of non-colons (maybe zero)
"([^:]*:)"
# non-colons in arbitrary number (including zero), followed by a colon
You want to use the quantifier syntax {4}. The quantifier is used to indicate that the preceding capture group needs to occur n number of times in order to meet the matching criteria. To find the pattern of five digits separated by semi-colons. something like the following would work.
((\d\:){4}\d)
I am assuming you may want any digit or word character but not whitespace or punctuation. In that case use the word character (\w).
((\w)[\:]){4}(\w))
But depending on what you would like to do with that pattern you may need a different regular expression. If you wanted to capture and replace all the colons while leaving the digits intact your pattern would need to use string replacement or more advanced grouping.
Any number of any characters, including 4 :
((.*:){4}.*)

Regex with more than one OR/AND operator

I'm trying to match text that is:
a combination of numbers and letters, and might contain [:,.]
OR
a * character plus at least one number OR letter (not necessarily in this order)
Meaning my regex should match all these
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
but not these:
0205
252,25
Yes, regex alternation with | does not have the meaning in a character group (e.g. [a-z|0-9]) that it does elsewhere in a pattern. (Think of it as implied between characters & character ranges within a character group, making it redundant.)
Pattern
This pattern should do what you need:
^((?=^.{0,}[0-9])(?=^.{0,}[a-zA-Z])[0-9a-zA-Z :,.]{2,}|(?!^\*$)(?=^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,})(?!^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}\*)[*0-9.a-zA-Z]{2,})$
It matches...
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
...and does not match...
0205
252,25
...as you require.
You can try the pattern with the inputs you specified in a regex fiddle.
Explanation
Some explanation for the 1st subpattern (on the left side of the |) matching your 1st set of match criteria:
(?=^.{0,}[0-9]) - Assert that a number appears in the string.
(?=^.{0,}[a-zA-Z]) - Assert that a letter also (i.e. AND) appears in the string.
[0-9a-zA-Z :,.]{2,} - "a combination of numbers and letters, and might contain [ :,.]" (assuming the aforementioned assertions)
Similarly, some explanation for the 2nd subpattern (on the right side of the |) matching your 2nd set of match criteria:
(?!^\*$) - Assert that the string is not just *.
(?=^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}) - Assert that the string contains *.
(?!^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}\*) - Assert that the string does not contain more than one *.
[*0-9.a-zA-Z]{2,} - "a * character + atleast one number OR letter (not necessarily in this order)" (assuming the aforementioned assertions)
There is probably room to sand & polish the pattern - especially the lookahead assertions for * in the second subpattern I suspect; but it works and conveys the strategy I employed of multiple lookahead assertions to constrain each of the two subpatterns to fit your requirements.
As you comment below, I think you dose want a full line match, and by saying number and letter, I think it means digits and letters both occurred in the right match.
And by saying "a * character + atleast one number OR letter" I suppose "*" occurs only once in match.
Maybe you could try this one:
(^(?=.*[a-zA-Z]+)(?=.*[0-9]+)[0-9a-zA-Z :,.]+$)|(^[a-zA-Z0-9.]*\*[a-zA-Z0-9.]+$)|(^[a-zA-Z0-9.]+\*[a-zA-Z0-9.]*$)
It matches:
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
123456*
.*.
test123
123test
But won't match any of:
0205
252,25
*
123*345*789
rebound
test
123
Original:
This should work
(^[A-Za-z0-9 ]*(([A-Za-z]+[ ]*[0-9]+)|([0-9]+[ ]*[A-Za-z]+))[A-Za-z0-9 ]*$)|(^\*[A-Za-z0-9]+$)

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.