This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I'm very new to programming and I've been told to avoid regex for now, but I find it extremely helpful.
When writing a program to check if a string only contains letters, I found on stackoverlow that both ^[a-zA-Z]+$ and [a-zA-Z]* yield the same results. I understand how [a-zA-Z] works and I understand how [A-z]is different from both of those as well, but I do not understand +$ vs ^[]* or why they yield the same result and I'm having trouble finding anything to explain it.
Here's the example I used it in:
String student = input.next();
while (!student.matches("[a-zA-Z]*")) {
System.out.print("Invalid input. Enter name: ");
student = input.next();
}
This is my first question here so sorry if this kind of question is frowned upon.
As you know,
[a-zA-Z]
Matches a single upper or lower-case letter.
[a-zA-Z]*
matches zero or more upper- or lower-case letters in a row.
^[a-zA-Z]+$
matches a string that STARTS with one-or more upper- or lower-case letters and also ends with it. Meaning, the only thing in your string is upper- or lower-case letters.
^ and $ play more of a role when you're dealing with streams of data, using regular expressions to sift out stuff you want while ignoring the stuff you don't. That last pattern could be used to find a stream consisting of only upper and lower-case letters.
* is zero or more, + is one or more.
However, there is a larger difference which is the ^ and $. In the first example, it is saying that it MUST contain only [a-zA-Z], where the string 123abc123 is not valid.
In the 2nd example, where ^ and $ are omitted, 123abc123 is valid.
Related
This question already has answers here:
How to get the count of only special character in a string using Regex?
(6 answers)
Closed 2 years ago.
I need to form the RegEx to produce the output only if more than two occurrences of special characters exists in the given string.
1) abcd##qwer - Match
2) abcd#dsfsdg#fffj-Match
3) abcd#qwetg- No Match
4) acwexyz - No Math
5) abcd#ds#$%fsdg#fffj-Match
Can anyone help me on this?
Note: I need to use this regular expression in one of the existing tool not in any programming language.
UPDATE after OP edit
The edited OP introduces a small amount of additional complexity that necessitates a different pattern entirely. The keys here are that (a) there is now a significantly limited set of "special characters" and (b) that these characters must appear at least twice (c) in any position in the string.
To implement this, you would use something like:
(?:.*?[##$%].*?){2,}
Asserts a non-capturing group,
Which contains any number of characters, followed by
Any character in the set ##$%
Followed by any number of characters
Ensures this pattern happens twice in a given string.
Original answer
By "special characters", I assume you mean anything outside standard alphanumeric characters. You can use the pattern below in most flavors of Regex:
([^A-Za-z0-9])\1
This (a) creates a set of all characters not including alphanumeric characters and matches a character against it, then (b) checks to see if the same character appears adjacent.
Regex101
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What the difference between
(?=.\d)(?=.[a-z])(?=.[A-Z])
and
(.\d)(.[a-z])(.[A-Z])
When I test the string a2A only the first RegExp returns true. Can anyone explain this for me?
The difference is in the lookahead operator for each of the terms in the regex. The LA operator matches the sub-regex it guards as usual, but effectively locks the initial matching position for the subsequent regex portion.
This means that the first regex should not match (contrary to your tests, which engine have you used ?) - Given any initial matching position, the second character would have to be a number, a lowercase letter, and an uppercase letter, all at the same time.
Observe that this will not happen if the . ('any char') is quantified:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])
Each LA term may skip an arbitrary amount of material before matching the character class, and this amount may differ between the subexpressions.
The second alternative (with and without quantification) will never match as it invariably requires a subsequence of digit-letter-letter, which the test string a2A does not provide.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
i wrote this regex for tokenize a text: "\b\w+\b"
but someone suggets me to convert it into \b[^\W\d_]+\b
can anyone explaing to me why this second way (using negation) is better?
thanks
The first one matches all letters, numbers and the underscore. Depending on the regex engine, this may include unicode letters and numbers. (the word boundaries are superfluous in this case btw.)
The second regex matches only letters (excluding non-word-charcters, digits and the underscore). Due to the word boundary, it will only match them, if they are surrounded by non-word-characters or start/end of th string.
If your regex engine supports this, you might want to use [[:alpha:]] or \p{L} (or [A-Za-z] in case of non-unicode) instead to make your intent clearer.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I came across this regex used for password validation:
(?=.*[a-z])(?=.*[A-Z])(?=.*[\d])(?=.*[^a-zA-Z\d])(?=\S+$).{8,}
There are only two things that are unclear to me about this regex:
what are .* used for and why this regex doesn't work without them?
what is the difference/benefit or using [\d] instead of \d, because the regex works just fine in both cases
.* matches any sequence of characters; . matches any character (other than newline, which is not relevant here) and * matches zero or more of the preceding pattern. This is used in the lookaheads to search for matches anywhere in the password. If you didn't have it,then it would require that you have those types of characters in a specific order: a lowercase letter followed by an uppercase letter followed by a digit. With .*, it means the password must contain at least one of each of them, but they can be anywhere in the password.
There's no difference between \d and [\d]. Whoever write this might just use the brackets out of habit, or perhaps to make it easier to modify it to put other characters into the character class.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
Sorry for asking this dumb question, but I just want to make sure I am doing right.
What is the difference between
^Sentence.*$
and
^Sentence.*
I usually use the first one, but I want to make sure which is the more appropriate.
It's depend of the context (i.e. the string).
The $ means by default: end of the string
And the quantifiers, like *, are by default greedy.
If the string doesn't contain a newline character, the two patterns are exactly the same. (in the sense they will match exactly the same strings)
But if your string contain a newline character, the .* will stop before it, because the dot, by default, doesn't match the newline character. So the first pattern will always fail, and the second pattern will only match the first line (if it begins with "Sentence" obviously)
From RegExBuddy;;
^ "Assert position at the beginning of a line (at beginning of the string or before a line break character)"
. "Match any single character that is not a line break character"
* "Between zero and unlimited times, as many times as possible, giving back as needed (greedy)"
$ "Assert position at the end of a line (at the end of the string or before a line break character)"
HTH.
http://www.regular-expressions.info/