I am trying to write some Regex that will match lines with exactly 12 letters (case-insensitive).
For instance, I want it to match 123124ab234cdef234gh1111ijkL (12 letters), but not abcdefgh1111ijk (11 letters) or abcdefgh1111ijkLM (13 letters). My thought was to do a nested lookahead twelve times:
(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z]).....))))
But this doesn't work. Neither does a simple twelve-letter match because the letters do not have to be conitguous:
[A-Za-z]{12}
Any help would be greatly appreciated. Thanks!
Here is a way:
^([^a-zA-Z]*[a-zA-Z]){12}[^a-zA-Z]*$
A quick break down:
^ # match the start of the input
( # start group 1
[^a-zA-Z]* # match zero or more non-letter chars
[a-zA-Z] # match one letter
){12} # end group 1 and match exactly 12 times
[^a-zA-Z]* # match zero or more non-letter chars
$ # match the end of the input
Note that [a-zA-Z] only matches the ASCII letters! The char 'É' wil not be matched by it. And therefor, [^a-zA-Z] does match 'É'.
Related
I need a regex for combination of numbers and uppercase letters and maybe lowercase letters and /,- characters, which contains at least 4 characters.
But of course it should contain at least 2 uppercase letter or one number.
I tried this:
barcode_regex = r"(?=(?:.+[A-Z]))(?=(?:.+[0-9]))([a-zA-Z0-9/-]{4,})"
For example match such cases as follows:
ametFXUT0
G197-6STK
adipiscXWWFHH
A654/9023847
HYJ/54GFJ
hgdy67h
You could use a single lookahead to assert at least 4 characters, and the match either a single digit or 2 uppercase chars in the allowed ranges.
^(?=.{4})(?:[A-Za-z/,-]*\d|(?:[a-z\d/,-]*[A-Z]){2})[A-Za-z\d/,-]*$
Explanation
^ Start of string
(?=.{4}) Assert 4 charcters
(?: Non capture group
[A-Za-z/,-]*\d Match optional allowed characters without a digit, then match a digit
| Or
(?:[a-z\d/,-]*[A-Z]){2} Match 2 times optional allowed characters withtout an uppercase char, then match an uppercase char
) Close non capture group
[A-Za-z\d/,-]* Match optional allowed characters
$ End of string
See a regex demo.
You could use two lookaheads combined via an alternation to check for 2 uppercase or 1 number:
^(?:(?=.*[A-Z].*[A-Z])|(?=.*\d))[A-Za-z0-9/-]+$
Demo
This regex patterns says to:
^
(?:
(?=.*[A-Z].*[A-Z]) assert that 2 or more uppercase are present
| OR
(?=.*\d) assert that at least one digit is present
)
[A-Za-z0-9/-]+ match any alphanumeric content (plus forward slash or dash)
$
I am doing a regex that detects me when a text has between 5 and 10 uppercase words. At the moment, my regex detects when the text has less than 5 words in capital letters, and when it has +5 matches.
The problem comes when you have more than 10, still giving match:
How can I solve that?
(?:\b[A-Z]+\b.*){5,10}
This pattern (?:\b[A-Z]+\b.*){5,10} matches \b[A-Z]+\b and then .* which will match all except a newline so not taking uppercase words into account.
If the whole string should contain between 5 and 10 uppercased words with word boundaries, you might use a temporary greedy token repeated 5 - 10 times and make use of a negative lookahead to assert what is on the right is not an uppercased word:
^(?:(?:(?!\b[A-Z]+\b).)*\b[A-Z]+\b){5,10}(?!.*\b[A-Z]+\b)
Regex demo
Explanation
^ Start of string
(?: Non capturing group
(?: Non capturing group
(?!\b[A-Z]+\b). Negative lookahead, assert what is on the right is not \b[A-Z]+\b, then match any character except a newline using .
)* Close non capturing group and repeat 0+ times
\b[A-Z]+\b Match word boundary, 1+ times an uppercase A-Z and word boundary
){5,10} Close non capturing group and repeat 5 - 10 times
(?!.*\b[A-Z]+\b) Negative lookahead, assert what is on the right \b[A-Z]+\b is not present
$ cat t1.txt:
ABCD_EFG_HIJK
ABCD_HJIJ_IJKL
What could be the Regex for the above two lines .
Even for one of the lines
Or
Scenario is 4characters followedby underscore followed by characters ( any number) followed by underscore followed by characters (any number) again underscore characters .. ends with characters.
4characters_(minimum of 1 characters)_(minimum of1 characters)_(ends with minimum of 1 characters).
Note : It starts with 4 characters.
After edit, the question is to find a regex that matches a string that starts with 4 chars, followed by minimum of 1 group which consists of '_' followed by minimal 1 character.
[A-Z]{4}(_[A-Z]+)+
explanation:
[A-Z]{4} # exactly 4 picks from A-Z
( # group 1 start
_[A-Z]+ # "_" followed by 1 or more character out of A-Z
)+ # group 1 end. Repeat group 1 1 or more times.
You can play with it at regex101
In the above regex I've chosen for capitals as characters, since this is suggested by the question. However, this could be a set of letters e.g., which would change the regex to:
[a-zA-Z]{4}(_[a-zA-Z]+)+
If you mean by any number of character at least one character, this is the most correct answer: /^[A-Za-z0-9]{4}_([A-Za-z0-9]+_)+[A-Za-z0-9]+$/g.
If you want, you can try this solution at regex website: regexr.com
EDIT: If you want to have only capital letters, than you should remove a-z and 0-9 from square brackets.
Another option:
[^_\n]+_[^_]+_[^_\n]+
Match everything except new line \n and _
between underscores
I am trying to create a regex to match any string that do not contain special characters but can contain either one . or _ and this should not be in the beginning or at the end of the string. I also want to keep the length of the string between 8 and 20 characters. The regex I am using now is the following:
"^[a-zA-Z][a-zA-Z._]*[a-zA-Z0-9]+$"
I haven't got much expertise with regex. So is there any way I can get a solution for my issue .
You can use this regex:
^[0-9a-zA-Z](?!(?:.*?[._]){2})[._a-zA-Z0-9]{6,18}[0-9a-zA-Z]$
RegEx Demo
Regex Breakup:
^ # match start of input
[0-9a-zA-Z] # match a digit or letter
(?!(?:.*?[._]){2}) # negative lookahead to disallow use of _ or . more than once
[._a-zA-Z0-9]{6,18} # match a digit or letter or dot or _ 6 to 18 times
[0-9a-zA-Z] # match a digit or letter
$ # match end of input
PS: Using {6,18} in the middle part to make total length on string between 8 and 20.
I want to match all the strings satifying following rules-
should consist of lower-case letters and digits and dashes
should start with a letter or a number
should end with a letter or number
total string length should be atleast 3 and atmost 20 characters
dot . is optional, there shouldn't be two or more consecutive dots .
dash - is optional, there shouldn't be two or more consecutive dashes -
dot . and dash - shouldn't be consecutive // the string aaa.-aaabbb is invalid
underscore not allowed
I have come up with this regex:
^[a-z0-9]([a-z0-9]+\.?\-?[a-z0-9]+){1,18}[a-z0-9]$
[a-z0-9] //should start/end with a letter or a number
([a-z0-9]+\.?\-?[a-z0-9]+){1,18} //other rules
However it is failing in some scenarios like -
abcdefghijklmnopqrstuvwxyz //should fail total number of chars greater than 20
aaa.-aaabbb //should fail as dot '.' and dash '-' are consecutive
Can anyone please help me in correcting this regex?
You can achieve this with a lookahead assertion:
^(?!.*[.-]{2})[a-z0-9][a-z0-9.-]{1,18}[a-z0-9]$
Explanation:
^ # Start of string
(?! # Assert that the following can't be matched:
.* # Any number of characters
[.-]{2} # followed by .. or -- or .- or -.
) # End of lookahead
[a-z0-9] # Match lowercase letter/digit
[a-z0-9.-]{1,18} # Match 1-18 of the allowed characters
[a-z0-9] # Match lowercase letter/digit
$ # End of string
I came up with this which uses a negative lookahead similar to Tim's solution but a different way of appying it. Because it only does the look ahead when it sees a dot or a dash it may not need to do quite so much back tracking which may make it perform very slightly faster.
^[a-z0-9]([a-z0-9]|([-.](?![.-]))){1,18}[a-z0-9]$
Explanation:
^ # Start of string
[a-z0-9] # Must start with a letter or number
( # Begin Group
[a-z0-9] # Match a letter or number
| # OR
([-.](?![.-])) # Match a dot or dash that is not followed by a dot or dash
){1,18} # Match group 1 to 18 times
[a-z0-9] # Must end with a letter or number
$ # End of string