This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I need a regular expression to validate strings with the prefix 'CON' followed by an optional space followed by 8 digits.
I've tried various expressions, I got tangled up and now I'm lost.
^(CON+s\?d{8})$
\bCON\b\S?D{8}
Syntax is off a bit
^(CON\s?\d{8})
( starts a capturing group
CON is exactly matched
\s matches any white space character and the ? makes it optional
\d{8} matches 8 digits
) ends the capturing group
You were pretty well off to start, Hope this helps :)
keeping in mind If there is no space, then there shouldn't be 8 more digits
^CON(\ \d{8})?
If the string you are looking for can be part of a larger string (note that in this case it may be preceded or followed by anything, even other digits):
CON\s?\d{8}
If the string must match in full, use ^$ to designate that:
^CON\s?\d{8}$
You can add variations to it, if say you want it to begin/end with a word boundary - use \bto indicate that. If you want it to end in a non-digit, use \D+ at the end, instead of $.
Finally, if you want the string to end with an EOL or a non-digit, you may use an expression like this:
CON\s?\d{8}(\D+|$) or the same with a non-capturing group: CON\s?\d{8}(?:\D+|$)
Related
I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm trying to develop a regex with the following rules:
it should accept solely numbers,
if the string contains any letters or any other special characters, the whole string should be rejected,
regarding spaces, there should only be one consecutive number group, which can be surrounded by spaces,
if there are more than one consecutive number group, with spaces in between the groups, that whole string should be rejected.
Example Cases:
accepted:
1234
[SPACE][SPACE]111[SPACE]
[SPACE]111[SPACE][SPACE]
declined:
1a234
aa1234aa
1234a
12#4
[SPACE]11[SPACE]111
[SPACE]11[SPACE]111#
So far, I've come up with this ([0-9]+[^\s]*) which can be seen here.
What modifications do I have to do to achieve the scenario I want above?
Use this:
^\s*\d+\s*$
All we need to do is accept one or more digits bounded by zero or more spaces on either side.
EDIT:
Just add a capturing group around the digits to use them later:
^\s*(\d+)\s*$
Demo
The pattern you tried ([0-9]+[^\s]*) matches 1+ digits and 0+ times a non whitespace character using a negated character class [^\s]* matching any character except a whitespace char (So it would match aa)
It can match multiple times in the same string as there are no anchors asserting the start ^ and the end $ of the string.
If you want to match spaces, instead of matching \s which could also match newlines, you could match a single space and repeat that 0+ times on the left and on the right side.
^ *[0-9]+ *$
Regex demo
If you only need the digits, you could use a capturing group
^ *([0-9]+) *$
Regex demo
^\s*[0-9]+\s*$
notice that I've used [0-9] instead of \d
[0-9] will accept only Arabic number (Western Arabic Number)
\d may accept all form of digit in unicode like Eastern Arabic Number, Thai,...etc like (١,٢,٣, ๑,๒,๓, ...etc) at least this is the case in XSD regex when its validate XML file.
This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
I would like to match 1 or more capital letters, [A-Z]+ followed by 0 or more numbers, [0-9]* but the entire string needs to be less than or equal to 8 characters in total.
No matter what regex I come up with the total length seems to be ignored. Here is what I've tried.
^[A-Z]+[0-9]*{1,8}$ //Range ignored, will not work on regex101.com but will on rubular.com/
^([A-Z]+[0-9]*){1,8}$ //Range ignored
^(([A-Z]+[0-9]*){1,8})$ //Range ignored
Is this not possible in regex? Do I just need to do the range check in the language I'm writing in? That's fine but I thought it would be cleaner to keep in all in regex syntax. Thanks
The behaviour is expected. When you write the following pattern:
^([A-Z]+[0-9]*){1,8}$
The {1,8} quantifier is telling the regex to repeat the previous pattern, therefore the capturing group in this case, between one to eight times. Due to the greedyness of your operators, you will match and capture indefinitely.
You need to use a lookahead to obtain the desired behaviour:
^(?=.{1,8}$)[A-Z]+[0-9]*$
^ Assert beginning of string.
(?=.{1,8}$) Ensure that the string that follows is between one and eight characters in length.
[A-Z]+[0-9]*$ Match any upper case letters, one or more, and any digits, zero or more.
$ Asserts position end of string.
See working demo here.
The regex ^([A-Z]+[0-9]*){1,8}$ would match [A-Z]+[0-9]* 1 - 8 times. That would match for example a repetition of 8 times A1A1A1A1A1A1A1A1 but not a repetition of 9 times A1A1A1A1A1A1A1A1A1
You might use a positive lookahead (?=[A-Z0-9]{1,8}$) to assert the length of the string:
^(?=[A-Z0-9]{1,8}$)[A-Z]+[0-9]*$
That would match
^ From the start of the string
(?=[A-Z0-9]{1,8}$) Positive lookahead to assert that what follows matches any of the characters in the character class [A-Z0-9] 1 - 8 times and assert the end of the string.
[A-Z]+[0-9]*$ Match one or more times an uppercase character followed by zero or more times a digit and assert the end of the string. $
This question already has answers here:
Regex how to match an optional character
(5 answers)
Closed 6 years ago.
I just want to write a regular expression 4 digits and '.' and 5 digits and optional 'A'
Ex: 1111.2345A where A is optional.
^[0-9]{4}[\.][0-9]{4}$
This reg ex will give 1111.2345, but how to add Optional 'N' at last.
Use ? at the end for characters:
[A-Za-z]?
This will match at most 1 presence of a character (lower or upper case).
You can check for a character zero or one times with this:
'[A]{0,1}'
Put that at the end of your string and it will try and match the character 'A' zero or one times. You may also use the symbol ? to match zero or one times. All about preference.
To get a single, optional A at the end, append A? to your regular expression:
^[0-9]{4}[\.][0-9]{4}A?$
Btw. instead of [0-9] you could use \d which stands for 'digit':
^\d{4}\.\d{4}A?$
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Does this regular expression mean that at least one of the following that isn't a-z:
(?=.*(?:[a-z]))
It's part of the following expression:
/^(?=[A-Za-z0-9\'\s\d\.]{2,50}$)(?=.*(?:[a-z]))[a-zA-Z0-9]+[A-Za-z0-9\'\s\.]+$/m
No, (?=.*(?:[a-z])) means that there could be whatever but must finish with a lowercase letter.
This regex means:
/^(?=[A-Za-z0-9\'\s\d\.]{2,50}$)(?=.*(?:[a-z]))[a-zA-Z0-9]+[A-Za-z0-9\'\s\.]+$/m
Match the line that starts with 2 to 50 alphanumeric, single quote, spaces or a dot, and then follows with lower case letter, and continues with alphanumerics and must ends followed by alphanumerics, spaces, single quote or dot.
Here you can see a better graphical approach for your regex:
Actually, this can be improved as:
/^(?=[A-Za-z\d'\s.]{2,50}$)(?=.*[a-z])[a-zA-Z\d]+[A-Za-z\d'\s.]+$/m