What is the difference between these two regular expressions [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
Sorry for asking this dumb question, but I just want to make sure I am doing right.
What is the difference between
^Sentence.*$
and
^Sentence.*
I usually use the first one, but I want to make sure which is the more appropriate.

It's depend of the context (i.e. the string).
The $ means by default: end of the string
And the quantifiers, like *, are by default greedy.
If the string doesn't contain a newline character, the two patterns are exactly the same. (in the sense they will match exactly the same strings)
But if your string contain a newline character, the .* will stop before it, because the dot, by default, doesn't match the newline character. So the first pattern will always fail, and the second pattern will only match the first line (if it begins with "Sentence" obviously)

From RegExBuddy;;
^ "Assert position at the beginning of a line (at beginning of the string or before a line break character)"
. "Match any single character that is not a line break character"
* "Between zero and unlimited times, as many times as possible, giving back as needed (greedy)"
$ "Assert position at the end of a line (at the end of the string or before a line break character)"
HTH.
http://www.regular-expressions.info/

Related

RegEx to find count of special characters in String [duplicate]

This question already has answers here:
How to get the count of only special character in a string using Regex?
(6 answers)
Closed 2 years ago.
I need to form the RegEx to produce the output only if more than two occurrences of special characters exists in the given string.
1) abcd##qwer - Match
2) abcd#dsfsdg#fffj-Match
3) abcd#qwetg- No Match
4) acwexyz - No Math
5) abcd#ds#$%fsdg#fffj-Match
Can anyone help me on this?
Note: I need to use this regular expression in one of the existing tool not in any programming language.
UPDATE after OP edit
The edited OP introduces a small amount of additional complexity that necessitates a different pattern entirely. The keys here are that (a) there is now a significantly limited set of "special characters" and (b) that these characters must appear at least twice (c) in any position in the string.
To implement this, you would use something like:
(?:.*?[##$%].*?){2,}
Asserts a non-capturing group,
Which contains any number of characters, followed by
Any character in the set ##$%
Followed by any number of characters
Ensures this pattern happens twice in a given string.
Original answer
By "special characters", I assume you mean anything outside standard alphanumeric characters. You can use the pattern below in most flavors of Regex:
([^A-Za-z0-9])\1
This (a) creates a set of all characters not including alphanumeric characters and matches a character against it, then (b) checks to see if the same character appears adjacent.
Regex101

Confusion in JavaScript RegExp ?= Quantifier [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What the difference between
(?=.\d)(?=.[a-z])(?=.[A-Z])
and
(.\d)(.[a-z])(.[A-Z])
When I test the string a2A only the first RegExp returns true. Can anyone explain this for me?
The difference is in the lookahead operator for each of the terms in the regex. The LA operator matches the sub-regex it guards as usual, but effectively locks the initial matching position for the subsequent regex portion.
This means that the first regex should not match (contrary to your tests, which engine have you used ?) - Given any initial matching position, the second character would have to be a number, a lowercase letter, and an uppercase letter, all at the same time.
Observe that this will not happen if the . ('any char') is quantified:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])
Each LA term may skip an arbitrary amount of material before matching the character class, and this amount may differ between the subexpressions.
The second alternative (with and without quantification) will never match as it invariably requires a subsequence of digit-letter-letter, which the test string a2A does not provide.

Is there a better way to validate multiple regex conditionals than giganic "or" statements?

I am practicing regular expressions and to test myself, I was trying to make some sort of simplified password validation expression. Basically, it would accept [0-9A-Za-z], but...
1) it needed to have a symbol (for simplicity sake, I only used [#&#!&*%$])
2) it needed to have a capital letter
In my mind, the best way to do this was with positive lookahead statements. The only problem was the validation before and after the symbol. If I had the capital letter lookahead at the beginning, it would only validate if the capital letter came before the symbol, and the same for the end. The only way I could counter this was to make a massive OR statement with the entire thing copied, but one having the lookahead at the beginning, and one having it at the end. This is the monstrosity that I came up with:
/^[0-9A-Za-z#&#!&*%$]*(?=[A-Z]+)[0-9A-Za-z#&#!&*%$]*(?=[#&#!&*%$])[#&#!&*%$][A-Za-z|#&#!&*%$]*|[0-9A-Za-z#&#!&*%$]*(?=[#&#!&*%$])[#&#!&*%$](?=[A-Z]+)[0-9A-Za-z#&#!&*%$]*$/
I'll try to break it down into parts that make sense to me (and hopefully to you guys as well).
First part of the OR statement
The beginning can be [0-9A-Za-z#&#!&*%$]*, so that's what I start with
Then comes the first positive lookahead, ensuring that there is a capital [A-Z]
Then comes the second lookahead ensuring that one of the symbols in [#&#!&*%$] is present.
Then, it allows any of those necessary symbols to come next
The first part ends with another allowance of [A-Za-z|#&#!&*%$]*
Second part of the OR statement
The second part is much like the first. Well, almost an entire copy and paste. I put an | OR symbol in place, but then instead of having the (?=[A-Z]+) lookahead before the symbol, I check for it after.
All in all, I put in a good amount of effort into something that works (for the most part). I did some extensive Googling, but nothing really seemed to answer my question. Is there an easier way to go about what I am looking to do?
You need to anchor the lookaheads at the start of the string (to just run them once) and add a .* or .*? before the required subpatterns in the lookaheads to allow the search anywhere on the line (note that . usually does not match line breaks, but your main pattern does not match them, so . is enough).
So, that said, you may use
^(?=.*[A-Z])(?=.*[#&#!&*%$])[0-9A-Za-z#&#!&*%$]*$
Details:
^ - start of string
(?=.*[A-Z]) - there must be an uppercase ASCII letter somewhere after any 0+ chars other than line breaks
(?=.*[#&#!&*%$]) - there must be a special char from the character class somewhere after any 0+ chars other than line breaks
[0-9A-Za-z#&#!&*%$]* - 0+ chars from the defined ranges or chars
$ - end of string.
See the regex demo. To make it more efficient, use the principle of contrast:
^(?=[^A-Z]*[A-Z])(?=[^#&#!&*%$]*[#&#!&*%$])[0-9A-Za-z#&#!&*%$]*$
^^^^^^^ ^^^^^^^^^^^^

Regex, Difference between ^[a-zA-Z]+$ vs [a-zA-Z]* [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I'm very new to programming and I've been told to avoid regex for now, but I find it extremely helpful.
When writing a program to check if a string only contains letters, I found on stackoverlow that both ^[a-zA-Z]+$ and [a-zA-Z]* yield the same results. I understand how [a-zA-Z] works and I understand how [A-z]is different from both of those as well, but I do not understand +$ vs ^[]* or why they yield the same result and I'm having trouble finding anything to explain it.
Here's the example I used it in:
String student = input.next();
while (!student.matches("[a-zA-Z]*")) {
System.out.print("Invalid input. Enter name: ");
student = input.next();
}
This is my first question here so sorry if this kind of question is frowned upon.
As you know,
[a-zA-Z]
Matches a single upper or lower-case letter.
[a-zA-Z]*
matches zero or more upper- or lower-case letters in a row.
^[a-zA-Z]+$
matches a string that STARTS with one-or more upper- or lower-case letters and also ends with it. Meaning, the only thing in your string is upper- or lower-case letters.
^ and $ play more of a role when you're dealing with streams of data, using regular expressions to sift out stuff you want while ignoring the stuff you don't. That last pattern could be used to find a stream consisting of only upper and lower-case letters.
* is zero or more, + is one or more.
However, there is a larger difference which is the ^ and $. In the first example, it is saying that it MUST contain only [a-zA-Z], where the string 123abc123 is not valid.
In the 2nd example, where ^ and $ are omitted, 123abc123 is valid.

what's wrong with this regex for password rules

I'm trying for at least 2 letters, at least 2 non letters, and at least 6 characters in length:
^.*(?=.{6,})(?=[a-zA-Z]*){2,}(?=[0-9##$%^&+=]*){2,}.*$
but that misses the mark on many levels, yet I'm not sure why. Any suggestions?
While this type of test can be done with a regex, it may be easier and more maintainable to do a non-regex check. The regex to achieve this is fairly complex and a bit unreadable. But the code to run this test is fairly straight forward. For example take the following method as an implementation of your requirements (language C#)
public bool IsValid(string password) {
// arg null check ommitted
return password.Length >= 6 &&
password.Where(Char.IsLetter).Count() > 2 &&
password.Where(x => !Char.IsLetter(x)).Count() > 2;
}
To answer the question in the title, here's what's wrong with your regex:
First, the .* (dot-star) at the beginning consumes the whole string. Then the first lookahead, (?=.{6,}) is applied and fails because the match position is at the end of the string. So the regex engine starts backtracking, "taking back" characters by moving the match position backward one character at a time and reapplying the lookahead. When it's taken back six characters, the first lookahead succeeds and the next one is applied.
The second lookahead is (?=[a-zA-Z]*), which means "at the current match position, try to match zero or more ASCII letters." The match position is still six characters back from the end of the string, but it doesn't matter; the lookahead will always succeed no matter you apply it, because it can legally match zero characters. Also, the letters can be anywhere in the string, so the lookahead has to accommodate whatever intervening non-letters there might be.
Then you have {2,}. It's not part of the lookahead subexpression because it's outside the parentheses. In that position, it means the lookahead has to succeed two or more times, which makes no sense. If it succeeded once, it will succeed any number of times, because it's being applied at the same position every time. Some regex flavors treat it as an error when you apply a quantifier to a lookahead (or to any other zero-width assertion, eg, lookbehind, word boundary, line anchors). Most flavors seem to ignore the quantifier.
Then you have another lookahead that will always succeed, and another useless quantifier. Finally, the dot-star at the end re-consumes the six characters the first dot-star had to relinquish.
I think this is what you were trying for:
^
(?=.{6})
(?=(?:[^A-Za-z]*[A-Za-z]){2})
(?=(?:[^0-9##$%^&+=]*[0-9##$%^&+=]){2})
.*$
If you really want to use regular expressions, try this one:
(?=.{6})(?=[^a-zA-Z]*[a-zA-Z][^a-zA-Z]*[a-zA-Z])(?=[^0-9##$%^&+=]*[0-9##$%^&+=][^0-9##$%^&+=]*[0-9##$%^&+=])^.+$
This matches anything that is at least six characters long ((?=.{6,})) and does contain at least two alphabetic characters ((?=[a-zA-Z][^a-zA-Z]*[a-zA-Z])) and does contain at least two characters of the character set [0-9##$%^&+=] ((?=[0-9##$%^&+=][^0-9##$%^&+=]*[0-9##$%^&+=])).