What is the difference between search pattern like [a-zA-Z][a-zA-Z]* and [a-zA-Z]* ?
The first matches one [a-zA-Z] followed by zero or more [a-zA-Z].
The second matches zero or more [a-zA-Z].
The first can also be written as [a-zA-Z]+.
The regex [a-zA-Z][a-zA-Z]* means that you are mandating that there should be one alpabetic character optionally followed by any number of alphabets. On the other hand, [a-zA-Z]* means that the alphabet mandate is entirely off.
For example, your first regex matches the strings azxxx, abccdef but fails 2abcd, 22 and blank strings. But the second regex can match a blank string too.
For the first regex, you may just want to say: [a-zA-Z]+ instead.
Related
I am trying to implement a regex which includes all the strings which have any number of words but cannot be followed by a : and ignore the match if it does. I decided to use a negative look ahead for it.
/([a-zA-Z]+)(?!:)/gm
string: lame:joker
since i am using a character range it is matching one character at a time and only ignoring the last character before the : .
How do i ignore the entire match in this case?
Link to regex101: https://regex101.com/r/DlEmC9/1
The issue is related to backtracking: once your [a-zA-Z]+ comes to a :, the engine steps back from the failing position, re-checks the lookahead match and finds a match whenver there are at least two letters before a colon, returning the one that is not immediately followed by :. See your regex demo: c in c:real is not matched as there is no position to backtrack to, and rea in real:c is matched because a is not immediately followed with :.
Adding implicit requirement to the negative lookahead
Since you only need to match a sequence of letters not followed with a colon, you can explicitly add one more condition that is implied: and not followed with another letter:
[A-Za-z]+(?![A-Za-z]|:)
[A-Za-z]+(?![A-Za-z:])
See the regex demo. Since both [A-Za-z] and : match a single character, it makes sense to put them into a single character class, so, [A-Za-z]+(?![A-Za-z:]) is better.
Preventing backtracking into a word-like pattern by using a word boundary
As #scnerd suggests, word boundaries can also help in these situations, but there is always a catch: word boundary meaning is context dependent (see a number of ifs in the word boundary explanation).
[A-Za-z]+\b(?!:)
is a valid solution here, because the input implies the words end with non-word chars (i.e. end of string, or chars other than letter, digits and underscore). See the regex demo.
When does a word boundary fail?
\b will not be the right choice when the main consuming pattern is supposed to match even if glued to other word chars. The most common example is matching numbers:
\d+\b(?!:) matches 12 in 12,, but not in 12:, and also 12c and 12_
\d+(?![\d:]) matches 12 in 12, and 12c and 12_, not in 12: only.
Do a word boundary check \b after the + to require it to get to the end of the word.
([a-zA-Z]+\b)(?!:)
Here's an example run.
I have this regex pattern
/^[^-\s][^0-9][a-zA-Z\s-]+$/
I am a bit confused on why when I test it on https://www.regextester.com/
My pattern allows one single number to be added before the string. Meaning that if I type in '2Mantas' it will still accept it whereas '22Mantas' will fail the test. I do not want any numbers or whitespace to be allowed. Any ideas anyone?
You have two negation groups so it is saying the first character cannot be whitespace and the second character cannot be a number. If you put the whitespace and digit in the first brackets it will work as desired.
^[^-\s\d][a-zA-Z\s-]+$
The first two rules in your current regular expression break down to the following:
^[^\s-] - the first character in the string should not be a whitespace or a hyphen. This explains why 2steve is accepted - 2 is not a whitespace or a hyphen character.
[^0-9] - the second character in the strnig should not be a digit. This iexplains why 22steve is not accepted - the 2 in the second position is a digit, which violates this rule.
Assuming you don't want anything but capital and lowercase letters in your first name input, and the name shouldn't start with a whitespace or hyphen character, you can simplify to a subset of your current regular expression:
/^[A-Za-z][A-Za-z-\s]+$/
Regex101
This should work
Get string, that starts with exactly one digit, and after this digits should be expression, that contains only strings (greedy)
^\d{1}([a-zA-Z]+)
https://regex101.com/r/wtBwd7/1
I am looking to clean up a regular expression which matches 2 or more characters at a time in a sequence. I have made one which works, but I was looking for something shorter, if possible.
Currently, it looks like this for every character that I want to search for:
([A]{2,}|[B]{2,}|[C]{2,}|[D]{2,}|[E]{2,}|...)*
Example input:
AABBBBBBCCCCAAAAAADD
See this question, which I think was asking the same thing you are asking. You want to write a regex that will match 2 or more of the same character. Let's say the characters you are looking for are just capital letters, [A-Z]. You can do this by matching one character in that set and grouping it by putting it in parentheses, then matching that group using the reference \1 and saying you want two or more of that "group" (which is really just the one character that it matched).
([A-Z])\1{1,}
The reason it's {1,} and not {2,} is that the first character was already matched by the set [A-Z].
Not sure I understand your needs but, how about:
[A-E]{2,}
This is the same as yours but shorter.
But if you want multiple occurrences of each letter:
(?:([A-Z])\1+)+
where ([A-Z]) matches one capital letter and store it in group 1
\1 is a backreference that repeats group 1
+ assume that are one or more repetition
Finally it matches strings like the one you've given: AABBBBBBCCCCAAAAAADD
To be sure there're no other characters in the string, you have to anchor the regex:
^(?:([A-Z])\1+)+$
And, if you wnat to match case insensitive:
^(?i)(?:([A-Z])\1+)+$
I try to create a regex to match lower and uppercase of A-Z, digits and ##$_ symbols with length limit of 4 to 16 for all of string.
My useless regex:
/^([a-zA-Z])|(\d)|(##\$_){4,16}$/
I test Online regex generators Like http://www.jslab.dk/tools.regex.php but don't have a good result .
Your regex /^([a-zA-Z])|(\d)|(##\$_){4,16}$/ matches for a single letter OR a single digit OR 4 to 16 characters of "##\$_".
The groups around the alternatives are useless.
One solution would be to make a group around the whole alternation
/^([a-zA-Z]|\d|##\$_){4,16}$/
but the better solution would be to add everything to one character class
/^[a-zA-Z##$_\d]{4,16}$/
See it here on Regexr
you can maybe simplify it further, since [a-zA-Z\d_] is the same than \w, when \w is not unicode based!
/^[\w##$]{4,16}$/
\w includes lowercase and UPPERCASE letters, digits and the _ character
RegEx Pattern: ^[\w#\#\$]{4,16}$
Explained demo here: http://regex101.com/r/rK1yH2
The expression that you need is this one:
( ([a-zA-Z])|(\d)|(##\$_) ){4,6}
The problem that you have in yours is that the last {2,6} are affecting only to the last group of brackets, not to the whole expression. Also make sure that the "/^" and "$/" are mandatory for your case, because the "^" means "not", so I'm not sure why you have it there.
You can also see it graphically here: http://www.debuggex.com/
I have three different things
xxx
xxx>xxx
xxx>xxx>xxx
Where xxx can be any combination of letters and number
I need a regex that can match the first two but NOT the third.
To match ASCII letters and digits try the following:
^[a-zA-Z0-9]{3}(>[a-zA-Z0-9]{3})?$
If letters and digits outside of the ASCII character set are required then the following should suffice:
^[^\W_]{3}(>[^\W_]{3})?$
^\w+(?:>\w+)?$
matches an entire string.
\w+(?:>\w+)?\b(?!>)
matches strings like this in a larger substring.
If you want to exclude the underscore from matching, you can use [\p{L]\p{N}] instead (if your regex engine knows Unicode), or [^\W_] if it doesn't, as a substitute for \w.