Please help me compose a working regular expression.
Conditions:
There can be a maximum of 9 characters (from 1 to 9).
The first eight characters can only be uppercase letters.
The last character can only be a digit.
Examples:
Do not match:
S3
FT5
FGTU7
ERTYUOP9
ERTGHYUKM
Correspond to:
E
ERT
RTYUKL
VBNDEFRW3
I tried using the following:
^[A-Z]{1,8}\d{0,1}$
but in this case, the FT5 example matches, although it shouldn't.
You may use an alternation based regex:
^(?:[A-Z]{1,8}|[A-Z]{8}\d)$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
[A-Z]{1,8}: Match 1 to 8 uppercase letters
|: OR
[A-Z]{8}\d: Match 8 uppercase letters followed by a digit
): End non-capture group
$: End
You might also rule out the first 7 uppercase chars followed by a digit using a negative lookhead:
^(?![A-Z]{1,7}\d)[A-Z]{1,8}\d?$
^ Start of string
(?![A-Z]{1,7}\d) Negative lookahead to assert not 1-7 uppercase chars and a digit
[A-Z]{1,8} Match 1-8 times an uppercase char
\d? Match an optional digit
$ End of string
Regex demo
With a regex engine that supports possessive quantifiers, you can write:
^[A-Z]{1,7}+(?:[A-Z]\d?)?$
demo
The letter in the optional group can only succeed when the quantifier in [A-Z]{1,7}+ reaches the maximum and when a letter remains. The letter in the group can only be the 8th character.
For the .net regex engine (that doesn't support possessive quantifiers) you can write this pattern using an atomic group:
^(?>[A-Z]{1,7})(?:[A-Z]\d?)?$
Related
The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo
To match a dash-less checksum I can do something like:
\b[0-9a-z]{32}\b
However, I'm seeing some checksums that also have dashes, such as:
d3bd55bf-062f-473b-9417-935f62c4c98a
While this is probably a fixed size, 8, then 4, then 4, then 4, then 12, I was wondering if I could do a regex where the number of non-dash digits adds up to 32. I think the answer is no, but hopefully some regex wizard can come up with something.
Here is a starting point for some sample inputs: https://regex101.com/r/K0IMKe/1.
You can use
\b[0-9a-z](?:-?[0-9a-z]){31}\b
See the regex demo.
It matches
\b - a word boundary
[0-9a-z] - a digit or a lowercase ASCII letter
(?:-?[0-9a-z]){31} - thirty-one repetitions of an optional - followed with a single digit or a lowercase ASCII letter
\b - a word boundary.
If you do not mind having a trailing - if there is a word char after it, at the end of a match, you may also use
\b(?:[0-9a-z]-?){32}\b
See this regex demo. Here, (?:[0-9a-z]-?){32} will match thirty-two repetitions of a digit or lowercase ASCII letter followed with an optional hyphen.
If there can be multiple dashes, you can assert 32 to 36 chars using a positive lookahead.
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+)*$
^ Start of string
(?=[a-z0-9-]{32,36}$) Positive lookahead, assert what is at the right is 32 - 36 repetitions of the listed characters
[a-z0-9]+ Match 1+ times any of the listed
(?: Non capture group
-[a-z0-9]+ Match a - followed by 1+ times any of the listed (the string can not end with a hyphen)
)* Close the group and match 0+ times to also match the string without dashes
$ End of string
Regex demo
If you want to limit the amount of dashes to 0 -4 times, you can change the quantifier * to {0,4}+
^(?=[a-z0-9-]{32,36}$)[a-z0-9]+(?:-[a-z0-9]+){0,4}+$
Regex demo
I am doing a regex that detects me when a text has between 5 and 10 uppercase words. At the moment, my regex detects when the text has less than 5 words in capital letters, and when it has +5 matches.
The problem comes when you have more than 10, still giving match:
How can I solve that?
(?:\b[A-Z]+\b.*){5,10}
This pattern (?:\b[A-Z]+\b.*){5,10} matches \b[A-Z]+\b and then .* which will match all except a newline so not taking uppercase words into account.
If the whole string should contain between 5 and 10 uppercased words with word boundaries, you might use a temporary greedy token repeated 5 - 10 times and make use of a negative lookahead to assert what is on the right is not an uppercased word:
^(?:(?:(?!\b[A-Z]+\b).)*\b[A-Z]+\b){5,10}(?!.*\b[A-Z]+\b)
Regex demo
Explanation
^ Start of string
(?: Non capturing group
(?: Non capturing group
(?!\b[A-Z]+\b). Negative lookahead, assert what is on the right is not \b[A-Z]+\b, then match any character except a newline using .
)* Close non capturing group and repeat 0+ times
\b[A-Z]+\b Match word boundary, 1+ times an uppercase A-Z and word boundary
){5,10} Close non capturing group and repeat 5 - 10 times
(?!.*\b[A-Z]+\b) Negative lookahead, assert what is on the right \b[A-Z]+\b is not present
I am trying to recognize these types of phone number inputs:
0172665476
+6265476393
+62-65476393
+62-654-76393
+62 65476393
While my regex: (?:\d+\s*)+ can recognize the 1st 2 sample values, it recognizes the last 3 sample values as multiple matches in each line, instead of recognizing the number as a whole.
How can I modify this to support multiple dashes and/or spaces and still recognize it as 1 whole number instead of multiple matches?
You may use this regex:
^\+?\d+(?:[\s-]\d+)*\b
RegEx Details:
^\+?: Match optional + at start
\d+: match 1+ digits
(?:[\s-]\d+)*: Match 0 or more groups that start with whitespace or - followed by 1+ digits
$: End (Replaced by word boundary as if there are trailing spaces, that match would be missed.)
This should work:
(?:[\d +-]+)+
This would work as per your reqt: (If there are trailing spaces, this regex will ignore.)
Regex: '^(?:[\d +-]+)\b'
Another option could be to use an alternation to match either 10 digits without a leading plus sign or match the pattern with a +, and optional space or hyphen:
(?:\d{10}|\+\d{2}[- ]?\d{3}-?\d{5})\b
That will match:
(?: Non capturing group
\d{10} Match 10 digits
| Or
\+\d{2}[-\s]?\d{3}-?\d{5} Match +, 2 digits, optional whitespace char or -, 3 digits, optional -, 5 digits
)\b Close non capturing group and word boundary
Regex demo
If your language supports negative lookbehinds you could prepend (?<!\S) which checks that what comes before is not a non-whitespace character.
I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?
The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}
If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved