Regex match checksum with or without dashes - regex

To match a dash-less checksum I can do something like:
\b[0-9a-z]{32}\b
However, I'm seeing some checksums that also have dashes, such as:
d3bd55bf-062f-473b-9417-935f62c4c98a
While this is probably a fixed size, 8, then 4, then 4, then 4, then 12, I was wondering if I could do a regex where the number of non-dash digits adds up to 32. I think the answer is no, but hopefully some regex wizard can come up with something.
Here is a starting point for some sample inputs: https://regex101.com/r/K0IMKe/1.

You can use
\b[0-9a-z](?:-?[0-9a-z]){31}\b
See the regex demo.
It matches
\b - a word boundary
[0-9a-z] - a digit or a lowercase ASCII letter
(?:-?[0-9a-z]){31} - thirty-one repetitions of an optional - followed with a single digit or a lowercase ASCII letter
\b - a word boundary.
If you do not mind having a trailing - if there is a word char after it, at the end of a match, you may also use
\b(?:[0-9a-z]-?){32}\b
See this regex demo. Here, (?:[0-9a-z]-?){32} will match thirty-two repetitions of a digit or lowercase ASCII letter followed with an optional hyphen.

If there can be multiple dashes, you can assert 32 to 36 chars using a positive lookahead.
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+)*$
^ Start of string
(?=[a-z0-9-]{32,36}$) Positive lookahead, assert what is at the right is 32 - 36 repetitions of the listed characters
[a-z0-9]+ Match 1+ times any of the listed
(?: Non capture group
-[a-z0-9]+ Match a - followed by 1+ times any of the listed (the string can not end with a hyphen)
)* Close the group and match 0+ times to also match the string without dashes
$ End of string
Regex demo
If you want to limit the amount of dashes to 0 -4 times, you can change the quantifier * to {0,4}+
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+){0,4}+$
Regex demo

Related

Regex for first eight letters and last number

Please help me compose a working regular expression.
Conditions:
There can be a maximum of 9 characters (from 1 to 9).
The first eight characters can only be uppercase letters.
The last character can only be a digit.
Examples:
Do not match:
S3
FT5
FGTU7
ERTYUOP9
ERTGHYUKM
Correspond to:
E
ERT
RTYUKL
VBNDEFRW3
I tried using the following:
^[A-Z]{1,8}\d{0,1}$
but in this case, the FT5 example matches, although it shouldn't.
You may use an alternation based regex:
^(?:[A-Z]{1,8}|[A-Z]{8}\d)$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
[A-Z]{1,8}: Match 1 to 8 uppercase letters
|: OR
[A-Z]{8}\d: Match 8 uppercase letters followed by a digit
): End non-capture group
$: End
You might also rule out the first 7 uppercase chars followed by a digit using a negative lookhead:
^(?![A-Z]{1,7}\d)[A-Z]{1,8}\d?$
^ Start of string
(?![A-Z]{1,7}\d) Negative lookahead to assert not 1-7 uppercase chars and a digit
[A-Z]{1,8} Match 1-8 times an uppercase char
\d? Match an optional digit
$ End of string
Regex demo
With a regex engine that supports possessive quantifiers, you can write:
^[A-Z]{1,7}+(?:[A-Z]\d?)?$
demo
The letter in the optional group can only succeed when the quantifier in [A-Z]{1,7}+ reaches the maximum and when a letter remains. The letter in the group can only be the 8th character.
For the .net regex engine (that doesn't support possessive quantifiers) you can write this pattern using an atomic group:
^(?>[A-Z]{1,7})(?:[A-Z]\d?)?$

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regex for extracting digits in a string not in a word and not separated by a symbol?

I want to extract an ID from a search query but I don't know the length of the ID.
From this input I want to get the numbers that are not in the words and the numbers that are not separated by symbols.
12 11231390 good123e41 12he12o1 1391389 dajue1290a 12331 12-10 1.2 test12.0why 12+12 12*6 2d1139013 09`29 83919 1
Here I want to return
12 11231390 1391389 12331 83919 1
So far I've tried /\b[^\D]\d*[^\D]\b/gm but I get the numbers in between the symbols and I don't get the 1 at the end.
You could repeatedly match digits between whitespace boundaries. Using a word boundary \b would give you partial matches.
Note that [^\D] is the same as \d and would expect at least a single character.
Your pattern can be written as \b\d\d*\d\b and you can see that you don't get the 1 at the end as your pattern matches at least 2 digits.
(?<!\S)\d+(?:\s+\d+)*(?!\S)
The pattern matches:
(?<!\S) Negateive lookbehind, assert a whitespace boundary to the left
\d+(?:\s+\d+)* Match 1+ digits and optionally repeat matching 1+ whitespace chars and 1+ digits.
(?!\S) Negative lookahead, assert a whitspace boundary to the right
Regex demo
If lookarounds are not supported, you could use a match with a capture group
(?:^|\s)(\d+(?:\s+\d+)*)(?:$|\s)
Regex demo

Find if a text contains between 5 and 10 words written in uppercase

I am doing a regex that detects me when a text has between 5 and 10 uppercase words. At the moment, my regex detects when the text has less than 5 words in capital letters, and when it has +5 matches.
The problem comes when you have more than 10, still giving match:
How can I solve that?
(?:\b[A-Z]+\b.*){5,10}
This pattern (?:\b[A-Z]+\b.*){5,10} matches \b[A-Z]+\b and then .* which will match all except a newline so not taking uppercase words into account.
If the whole string should contain between 5 and 10 uppercased words with word boundaries, you might use a temporary greedy token repeated 5 - 10 times and make use of a negative lookahead to assert what is on the right is not an uppercased word:
^(?:(?:(?!\b[A-Z]+\b).)*\b[A-Z]+\b){5,10}(?!.*\b[A-Z]+\b)
Regex demo
Explanation
^ Start of string
(?: Non capturing group
(?: Non capturing group
(?!\b[A-Z]+\b). Negative lookahead, assert what is on the right is not \b[A-Z]+\b, then match any character except a newline using .
)* Close non capturing group and repeat 0+ times
\b[A-Z]+\b Match word boundary, 1+ times an uppercase A-Z and word boundary
){5,10} Close non capturing group and repeat 5 - 10 times
(?!.*\b[A-Z]+\b) Negative lookahead, assert what is on the right \b[A-Z]+\b is not present

Regex - minimum and maximum of each symbol type

I need to check if the string contains from 0 to 3 spaces and 16 digits. How can I do this ? All that I come up with is only for checking the sum
^[0-9- ]{16,19}$
You actually should use
^(?=(?:[^ ]* ){0,3}[^ ]*$)(?=(?:[^0-9]*[0-9]){16}[^0-9]*$)[0-9- ]+$
See the regex demo at regex101.com.
Alternatively, the first space checking positive lookahead may be replaced with a negative one with reverse logic:
^(?!(?:[^ ]* ){4})(?=(?:[^0-9]*[0-9]){16}[^0-9]*$)[0-9- ]+$
See another demo
Both the regexps are written with the principle of contrast in mind, so as to fail the regex quicker if the lookahead pattern does not match.
Details:
^ - start of string
(?!(?:[^ ]* ){4}) - a negative lookahead failing the match if there are 4 sequences immediately to the right of the current location, of:
[^ ]* - 0+ chars other than a space
- a space
(?=(?:[^0-9]*[0-9]){16}[^0-9]*$) - a positive lookahead requiring that the whole string should contain 16 sequences of 0+ non-digits ([^0-9]*) followed with 1 digit, and then 0+ chars other than a digit up to the end of the string
[0-9- ]+ - matches 1+ chars that are either digits, - or spaces
$ - end of string.
You can use this regex based on lookaheads:
^[0-9](?!(?:[0-9]* ){4})(?=(?: *[0-9]){15}$)[0-9- ]+[0-9]$
RegEx Demo
^[0-9] and [0-9]$ ensures we have only digits at start and end.
(?!(?:[0-9]* ){4}) is negative lookahead to disallow 4 spaces (thus allowing 0 to 3 whitespaces)
(?=(?: *[0-9]){16} *$) is positive lookahead to allow exactly 16 digits in the input surrounded by optional spaces.