Matching words that may contain 1-2 digits - regex

I use the following regex for matching words with a length of 4 that has 1 number and 3 capital letters:
\b(?=[A-Z]*\d[A-Z]*\b)[A-Z\d]{4}\b
What I would like to know is how I need to modify the expression to filter out words with a length of 10, that contains 0-2 numbers.
\b(?=[A-Z]*\d[A-Z]*\b)[A-Z\d]{10}\b
This will work for 1 number occurence, but how do i extend it to filter 0 and 2 numbers as well?
Sample: http://regexr.com?32u40

Put the length check into the lookahead:
\b(?=[A-Z\d]{10}\b)(?:[A-Z]*\d){0,2}[A-Z]*\b
Explanation:
\b # Start at a word boundary
(?= # Assert that...
[A-Z\d]{10} # 10 A-Z/digits follow
\b # until the next word boundary.
) # (End of lookahead)
(?: # Match...
[A-Z]* # Any number of ASCII uppercase letters
\d # and exactly one digit
){0,2} # repeat 0, 1 or 2 times.
[A-Z]* # Match any number of letters
\b # until the next word boundary.

Related

Match digits (which may contain spaces) except when preceded by a specific word

I have this regular expression to look for numbers in a text that do not belong to a price (in euro):
(?<!EUR )(\d\s*)+
I want it to not match:
EUR 10 000
And I want it to match the numbers in the following cases:
{3}
10 000
347835
The problem I now face is that it matches the numbers I want it to fine, but it does also matches the 0 000 part of the text that I don't want it to match.
Edit:
I want to match all numbers (including spaces in between numbers) unless they are preceded by "EUR ".
To make it more clear what I want to match I will provide all the cases from above and make the things I want to match bold:
EUR 10 000
{3}
10 000
347835
What my regular expression currently matches is:
EUR 10 000
{3}
10 000
347835
As you are already using a capture group, you can match what you don't want and capture what you want to keep.
\bEUR *\d+(?:[ \d]*\d)?\b|\b(\d+(?: +\d+)*)\b
Explanation
\bEUR * Match EUR and optional spaces
\d+(?:[ \d]*\d)?\b Match 1+ digits and optional spaces and digits ending on a digit followed by a word boundary
| Or
\b A word boundary to prevent a partial word match
( Capture group 1 (The value that you are interested in)
\d+(?: +\d+)* Match 1+ digits and optionally repeat 1+ spaces and 1+ digits
) Close group 1
\b A word boundary
Regex demo
Note that you can also use \s instead of a space, but that can also match newlines.

Find if a text contains between 5 and 10 words written in uppercase

I am doing a regex that detects me when a text has between 5 and 10 uppercase words. At the moment, my regex detects when the text has less than 5 words in capital letters, and when it has +5 matches.
The problem comes when you have more than 10, still giving match:
How can I solve that?
(?:\b[A-Z]+\b.*){5,10}
This pattern (?:\b[A-Z]+\b.*){5,10} matches \b[A-Z]+\b and then .* which will match all except a newline so not taking uppercase words into account.
If the whole string should contain between 5 and 10 uppercased words with word boundaries, you might use a temporary greedy token repeated 5 - 10 times and make use of a negative lookahead to assert what is on the right is not an uppercased word:
^(?:(?:(?!\b[A-Z]+\b).)*\b[A-Z]+\b){5,10}(?!.*\b[A-Z]+\b)
Regex demo
Explanation
^ Start of string
(?: Non capturing group
(?: Non capturing group
(?!\b[A-Z]+\b). Negative lookahead, assert what is on the right is not \b[A-Z]+\b, then match any character except a newline using .
)* Close non capturing group and repeat 0+ times
\b[A-Z]+\b Match word boundary, 1+ times an uppercase A-Z and word boundary
){5,10} Close non capturing group and repeat 5 - 10 times
(?!.*\b[A-Z]+\b) Negative lookahead, assert what is on the right \b[A-Z]+\b is not present

Match exactly 12 non-contiguous letters in Regex

I am trying to write some Regex that will match lines with exactly 12 letters (case-insensitive).
For instance, I want it to match 123124ab234cdef234gh1111ijkL (12 letters), but not abcdefgh1111ijk (11 letters) or abcdefgh1111ijkLM (13 letters). My thought was to do a nested lookahead twelve times:
(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z]).....))))
But this doesn't work. Neither does a simple twelve-letter match because the letters do not have to be conitguous:
[A-Za-z]{12}
Any help would be greatly appreciated. Thanks!
Here is a way:
^([^a-zA-Z]*[a-zA-Z]){12}[^a-zA-Z]*$
A quick break down:
^ # match the start of the input
( # start group 1
[^a-zA-Z]* # match zero or more non-letter chars
[a-zA-Z] # match one letter
){12} # end group 1 and match exactly 12 times
[^a-zA-Z]* # match zero or more non-letter chars
$ # match the end of the input
Note that [a-zA-Z] only matches the ASCII letters! The char 'É' wil not be matched by it. And therefor, [^a-zA-Z] does match 'É'.

How to make sure that certain digits in a number are not the same

I have a couple of number strings like the following:
0000000
0000011
0000012
I want to validate that the pattern is like this:
AAAAABC
where A, B and C are all different digits. So in the example, only 0000012 should be matched.
My regex so far is (\d)\1\1\1\1\d\d, but it doesn't make sure that the digits are different. What do I need to do?
I think you want
(\d)\1{4}(?!\1)(\d)(?!\1|\2)\d
Explanation:
(\d) # Match a digit, capture in group 1
\1{4} # Match the same digit as before four times
(?!\1) # Assert that the next character is not the same digit as before
(\d) # Match another digit, capture in group 2
(?!\1|\2) # Assert the next character is different from both previous digits
\d # Match another digit.
See it on regex101.

Regular Expression to match strings

I want to match all the strings satifying following rules-
should consist of lower-case letters and digits and dashes
should start with a letter or a number
should end with a letter or number
total string length should be atleast 3 and atmost 20 characters
dot . is optional, there shouldn't be two or more consecutive dots .
dash - is optional, there shouldn't be two or more consecutive dashes -
dot . and dash - shouldn't be consecutive // the string aaa.-aaabbb is invalid
underscore not allowed
I have come up with this regex:
^[a-z0-9]([a-z0-9]+\.?\-?[a-z0-9]+){1,18}[a-z0-9]$
[a-z0-9] //should start/end with a letter or a number
([a-z0-9]+\.?\-?[a-z0-9]+){1,18} //other rules
However it is failing in some scenarios like -
abcdefghijklmnopqrstuvwxyz //should fail total number of chars greater than 20
aaa.-aaabbb //should fail as dot '.' and dash '-' are consecutive
Can anyone please help me in correcting this regex?
You can achieve this with a lookahead assertion:
^(?!.*[.-]{2})[a-z0-9][a-z0-9.-]{1,18}[a-z0-9]$
Explanation:
^ # Start of string
(?! # Assert that the following can't be matched:
.* # Any number of characters
[.-]{2} # followed by .. or -- or .- or -.
) # End of lookahead
[a-z0-9] # Match lowercase letter/digit
[a-z0-9.-]{1,18} # Match 1-18 of the allowed characters
[a-z0-9] # Match lowercase letter/digit
$ # End of string
I came up with this which uses a negative lookahead similar to Tim's solution but a different way of appying it. Because it only does the look ahead when it sees a dot or a dash it may not need to do quite so much back tracking which may make it perform very slightly faster.
^[a-z0-9]([a-z0-9]|([-.](?![.-]))){1,18}[a-z0-9]$
Explanation:
^ # Start of string
[a-z0-9] # Must start with a letter or number
( # Begin Group
[a-z0-9] # Match a letter or number
| # OR
([-.](?![.-])) # Match a dot or dash that is not followed by a dot or dash
){1,18} # Match group 1 to 18 times
[a-z0-9] # Must end with a letter or number
$ # End of string