Regex: Non-Word Not Recognised at the Beginning of an Expression - regex

I'm using a standard regex expression as a password check, which as far as I can see should accept non-word characters (\W) at the beginning of the expression, but doesn't. The regex is designed to require a minimum of 8 characters, and a combination of at least 1 lower, 1 upper, 1 number and 1 nonword character.
Does anyone know what I'm doing wrong?
\b(?=.{8,})(?=.*[a-z])(?=.*[A-Z])(?=.*[\d])(?=.*[\W])\b.*
E.g.
T3st1ng!
is identified
!T3sting
is not.

You need to use
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*\W).{8,}$
See the regex demo
Instead of \b you need ^ and $ anchors. Moreover, you do not have to set the lookahead length check, it can be moved to the . part at the end. Also, no need using single shorthand class inside a character class, replace [\d] with \d for a cleaner expression.

Word boundary \b is a virtual space between a non-word characters and word characters.
Word characters are all alphanumeric ones. ! is a non-word character, so your !T3sting won't work.
And you don't need that if you are checking for password only.
Correct one is:
(?=.{8,})(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*\W).*

Related

How to overcome multiple matches within same sentence (regex) [duplicate]

I am trying to implement a regex which includes all the strings which have any number of words but cannot be followed by a : and ignore the match if it does. I decided to use a negative look ahead for it.
/([a-zA-Z]+)(?!:)/gm
string: lame:joker
since i am using a character range it is matching one character at a time and only ignoring the last character before the : .
How do i ignore the entire match in this case?
Link to regex101: https://regex101.com/r/DlEmC9/1
The issue is related to backtracking: once your [a-zA-Z]+ comes to a :, the engine steps back from the failing position, re-checks the lookahead match and finds a match whenver there are at least two letters before a colon, returning the one that is not immediately followed by :. See your regex demo: c in c:real is not matched as there is no position to backtrack to, and rea in real:c is matched because a is not immediately followed with :.
Adding implicit requirement to the negative lookahead
Since you only need to match a sequence of letters not followed with a colon, you can explicitly add one more condition that is implied: and not followed with another letter:
[A-Za-z]+(?![A-Za-z]|:)
[A-Za-z]+(?![A-Za-z:])
See the regex demo. Since both [A-Za-z] and : match a single character, it makes sense to put them into a single character class, so, [A-Za-z]+(?![A-Za-z:]) is better.
Preventing backtracking into a word-like pattern by using a word boundary
As #scnerd suggests, word boundaries can also help in these situations, but there is always a catch: word boundary meaning is context dependent (see a number of ifs in the word boundary explanation).
[A-Za-z]+\b(?!:)
is a valid solution here, because the input implies the words end with non-word chars (i.e. end of string, or chars other than letter, digits and underscore). See the regex demo.
When does a word boundary fail?
\b will not be the right choice when the main consuming pattern is supposed to match even if glued to other word chars. The most common example is matching numbers:
\d+\b(?!:) matches 12 in 12,, but not in 12:, and also 12c and 12_
\d+(?![\d:]) matches 12 in 12, and 12c and 12_, not in 12: only.
Do a word boundary check \b after the + to require it to get to the end of the word.
([a-zA-Z]+\b)(?!:)
Here's an example run.

regex word boundary excluding the hyphen

i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary.
i.e. get all expressions matched by
type ([a-z])\b
but do not match e.g.
type a-1
to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]
You can use a lookahead for this, the shortest would be to use a negative lookahead:
type ([a-z])(?![\w-])
(?![\w-]) would mean "fail the match if the next character is in \w or is a -".
Here is an option that uses a normal lookahead:
type ([a-z])(?=[^\w-]|$)
You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".
See it working: http://www.rubular.com/r/NHYhv72znm
I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:
\b(?<!\*)([^\s\*]+)\b(?!*)
Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:
*word
wo*rd
word*

regex to filter sentences based on word length

I'm trying to figure out a regex to match strings where the length of each word is less than some value.
E.g., if the value is 6, the regex should match: "this is a test string" and not "this is another test string", because the length of "another" is greater than 6.
How about:
^(?:\b\S{1,5}\b\s*)+$
explanation:
^ : start of string
(?: : start of non capture group
\b : word boundary
\S{1,5} : one to five non space char
\b : word boundary
\s* : 0 or more spaces
)+ : end of group one or more times
$ : end of string
^\w{1,5}(\s+\w{1,5})*$
this should match strings of one or more words of length up to 5
at least in languages in which the {n,m} syntax is allowed, like Java or Perl
The exact syntax of the regular expression you're looking for depends on the language you're using, however this is very possible. The following example is in Python:
import re
def matchStringLength(value, string):
pattern = re.compile('([A-z]{1,%s} )+' % value)
return pattern.match(string) != None
This should be enough to let you develop a method which meets your requirements fully, the above will fail for strings with numbers, special characters, etc.
One possibility is to use a negative lookahead
^(?!.*\b\w{7,}\b).+$
See and test it here on Regexr
Here the approach is a different one, basically I accept everything with the ^.+$ part (at least one character because of the +, change it to * if you would like to accept the empty string also).
Then I add an assertion to the expression (?!.*\b\w{7,}\b). This does not match a character but it checks if the assertion is true. This means here, in the whole string there is no part with 7 or more word characters in a row.
(?!...) negative lookahead assertion
\w a word character, depends on your language, at least a-zA-Z and _ . In some languages also all Unicode characters that are a letter or a digit are included in \w. See here for character classes on regular-expression.info
\b is a word boundary, i.e. the change from a word character to a non word character or the other way round.
[^\s]{5,} should do the trick! It will count any other char than spaces, though, so commas etc will be included unless you add them to the square brackets.

regular expression generation

I need a regular expression to check a string should contain only letters and space.No other character other than letter [A-Z] and space are allowed.
Please help.
The complete regex looks like this
^[A-Z ]+$
You can simply create a character class and put the characters in that you want to allow:
[A-Z ]
if you want to allow also lower case letters then use
[A-Za-z ]
or use the i (IgnoreCase) option
So your character class matches 1 character. you want to repeat it to match more than one character.
+ would be at least one character, where
* would additionally match 0 characters
As last step you need to ensure that the complete string is matched, you can do this using anchors.
^ matches the beginning of the string
$ matches the end of the string (or a newline if you use the m (multiline) option
A character class should be sufficient
[A-Z ]+
i.e. one or more of letters between A-Z and space
Check that the string matches the following:
^[a-zA-Z ]*$
Regex character classes can be negated by putting a ^ symbol at the begining of them.
Your example could be negated like this: [^A-Z]. Add a space to allow the full range of characters you want to check for and you have [^A-Z ].
Now you have a validator that meets your criteria: If that regex returns true then your validation fails.
Since you didn't specify the programming language you're working in, I can't help you much further than that.
This will match what you need:
^[A-Z\s]+$
try matching with this regex
^[A-Za-z\s]+$
this should do the trick

Regular Expression related: first character alphabet second onwards alphanumeric+some special characters

I have one question related with regular expression. In my case, I have to make sure that
first letter is alphabet, second onwards it can be any alphanumeric + some special characters.
Regards,
Anto
Try something like this:
^[a-zA-Z][a-zA-Z0-9.,$;]+$
Explanation:
^ Start of line/string.
[a-zA-Z] Character is in a-z or A-Z.
[a-zA-Z0-9.,$;] Alphanumeric or `.` or `,` or `$` or `;`.
+ One or more of the previous token (change to * for zero or more).
$ End of line/string.
The special characters I have chosen are just an example. Add your own special characters as appropriate for your needs. Note that a few characters need escaping inside a character class otherwise they have a special meaning in the regular expression.
I am assuming that by "alphabet" you mean A-Z. Note that in some other countries there are also other characters that are considered letters.
More information
Character Classes
Repetition
Anchors
Try this :
/^[a-zA-Z]/
where
^ -> Starts with
[a-zA-Z] -> characters to match
I think the simplest answer is to pick and match only the first character with regex.
String str = "s12353467457458";
if ((""+str.charAt(0)).matches("^[a-zA-Z]")){
System.out.println("Valid");
}