Regex ignore special character with greedy - regex

I used the following regex to catch 10 numbers and letters:
/[a-zA-Z0-9]{10}/g
It works fine if the 10 characters are only numbers and letters.
e.g. input: 12345xcdw034342
it catches 12345xcdw0
But in this case with special characters or space, it doesn't catch it.
123}456712234324Zz3 or 123}45 71223AB3
It should catch 10 numbers and letters regardness of characters.
Any help would be gratefully appreciated.

You can do it but not without any extra processing
As you have not spetified what language you're using Ill use Javascript for being quite universal but the same logic must apply in any language.
Here are the options I can think of
if I have testString = "12#34{56A789BDE"
Match the all until the first ten alphanumeric caracters, and then remove the spetial characters in the resulting string
testString.match(/(\w.*?){10}/)[0].replaceAll(/\W/g, '')
// results '123456A789'
// explanation: we take the first \w and use .*? to indicate that we dont care if the alphanumeric has a non-alphanumeric right next to it, then we clean the result by removing \W which means non-alphanumeric
Match only the first ten alphanumeric caracters and then join them to make a result string
testString.match(/\w/g).splice(0,10).join('')
// results '123456A789'
// explanation: we match 10 groups of aphanumeric characters represented by \w (note the lowercase) and we join the first 10 (using splice to get them) as each group "()" is in the case of javascript returned as an element of an array of matches
Remove the spetial characters from your string and then take the first ten
testString.replaceAll(/\W/g,'').match(/\w{10}/)[0]
// results '123456A789'
// explanation: we replace \W which means non alpha numeric characters, with '' to delete them then we match the first ten

You can use
/[a-zA-Z0-9](?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9}/g
See the regex demo. Details:
[a-zA-Z0-9] - an alphanumeric
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9} - nine occurrences of any zero or more chars other than an alphanumeric char and then an alphanumeric char.

Related

Regex matching string containing at least x characters and x numbers

I have a requirement to test if string matches following rules:
Has at least 8 [a-zA-Z!##$%^&+=] characters and has at least 1 [0-9] number OR
Has at least 8 [0-9] numbers and has at least 1 [a-zA-Z!##$%^&+=] character
So far I tried this:
"^(?=(?=.*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=])(?=.*[0-9])|(?=.*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9])(?=.*[a-zA-Z!##$%^&+=])).{8,}\$")
It mostly works ok, but one scenario is failing:
"!abcdefgh1" --> matched (OK)
"{abcdefgh1" --> matched (NOT OK because character { shouldn't be allowed)
How to disallow any other characters except [a-zA-Z!##$%^&+=]?
Is it possible to write that regex in shorter way?
Thanks
The problem is that your .s are matching any character. To keep the convenience of using . to match a generic character but also require that the string doesn't contain any characters other than what's allowed, a simple tweak would be another lookahead at the beginning of the string to ensure that all characters before the end of the string are [a-zA-Z!##$%^&+=] or [0-9], nothing else.
Also note that [0-9] simplifies to \d, which is a bit nicer to look at:
^(?=[a-zA-Z!##$%^&+=\d]{9,}$) <rest of your regex>
You can also simplify your regex by repeating the big character set in a group, when possible, rather than writing it out manually 8 times. Also, as comment notes, when checking whether a string has enough digits, better to repeat (?:\D*\d) rather than using a dot, because you know that you want the antecedent to match non-digit characters.
Actually, because the initial lookahead above ensures that the string contains only digits and those certain non-digit characters, rather than repeating the long character set [a-zA-Z!##$%^&+=] again and again when matching a non-digit, you can just use \D, since the initial lookahead guarantees that non-digits will be within that character set.
For example:
^(?=[a-zA-Z!##$%^&+=\d]+$)(?:(?=\D*\d)(?=(?:\d*\D){8})|(?=(?:\D*\d){8})(?=\d*\D))
Explanation:
^(?=[a-zA-Z!##$%^&+=\d]{9,}$) - ensure string contains only desired characters (fail immediately if there are not at least 9 of them), then alternate between either:
(?=\D*\d)(?=(?:\d*\D){8}) - string contains at least one digit, and 8 other characters, or
(?=(?:\D*\d){8})(?=\d*\D) - string contains at least 8 digits, and at least one of the other characters
https://regex101.com/r/18xtBw/2 (to test, input only one line at a time - otherwise, the \Ds will match newline characters, which will cause problems)

Regex for a string with alpha numeric containing a '.' character

I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.
Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)
If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).
You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3

Matching any password except one containing repeating characters [duplicate]

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!
http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.
Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!
http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.
Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.