This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I found [.] in a regular expression on manpage of notmuch:
notmuch search 'from:"/bob#.*[.]example[.]com/"'
It seemed to be useless because brackets are for list but have only one character, but finally I learned it matches a literal dot.
Then, why they use it rather than \.? Are there any advantages on this expression?
At first I thought that this is to avoid double escaping but on further consideration I think this is because a dot in a character set ([]) is treated differently than normal. It makes sense that in a character set a dot only matches a literal dot, the whole point is to match a specific set of characters so having a wildcard in the set doesn't make sense.
So [.,;:] may be used to match punctuation marks.
Once you take that into account it's obvious that [.] just matches dot.
Whether to use \. or [.] is left as an aesthetic decision.
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
i wrote this regex for tokenize a text: "\b\w+\b"
but someone suggets me to convert it into \b[^\W\d_]+\b
can anyone explaing to me why this second way (using negation) is better?
thanks
The first one matches all letters, numbers and the underscore. Depending on the regex engine, this may include unicode letters and numbers. (the word boundaries are superfluous in this case btw.)
The second regex matches only letters (excluding non-word-charcters, digits and the underscore). Due to the word boundary, it will only match them, if they are surrounded by non-word-characters or start/end of th string.
If your regex engine supports this, you might want to use [[:alpha:]] or \p{L} (or [A-Za-z] in case of non-unicode) instead to make your intent clearer.
This question already has answers here:
RegEx for allowing alphanumeric at the starting and hyphen thereafter
(4 answers)
Closed 5 years ago.
I want to build a regular expression which only matches [A-Za-z0-9\-] with an additional rule that hyphens (-) are not allowed to appear at the start and at the end.
For example:
my-site is matched.
m is matched.
mysite- is not matched.
-mysite is not matched.
Currently, I've come up with ^[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9]+$.
But this doesn't match m.
How can I change my regular expression so that it fits my needs?
Use look arounds:
^(?!-)[A-Za-z0-9-]*(?<!-)$
The reason this works is that look arounds don't consume input, so the look ahead and the look behind can both assert on the same character.
Note that you don't need to escape the dash within the character class if it's the first or last character.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I came across this regex used for password validation:
(?=.*[a-z])(?=.*[A-Z])(?=.*[\d])(?=.*[^a-zA-Z\d])(?=\S+$).{8,}
There are only two things that are unclear to me about this regex:
what are .* used for and why this regex doesn't work without them?
what is the difference/benefit or using [\d] instead of \d, because the regex works just fine in both cases
.* matches any sequence of characters; . matches any character (other than newline, which is not relevant here) and * matches zero or more of the preceding pattern. This is used in the lookaheads to search for matches anywhere in the password. If you didn't have it,then it would require that you have those types of characters in a specific order: a lowercase letter followed by an uppercase letter followed by a digit. With .*, it means the password must contain at least one of each of them, but they can be anywhere in the password.
There's no difference between \d and [\d]. Whoever write this might just use the brackets out of habit, or perhaps to make it easier to modify it to put other characters into the character class.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I have the following regex :
.*(?:(?:(?<!a)cc|string).*number).*
And I am trying to understand what the ? in the beginning of the string between brackets mean. I know the a? means that the previous character 'a' can be repeated zero or one time. But what does it mean when it appears in the beginning of a string ?
The answer requires a little history lesson. When Larry Wall wanted to add new features to regexes in Perl, he couldn't just change the meaning of existing metacharacters, or assign special meanings to characters that didn't have them. That would have broken a lot of regexes that had been working. Instead, he had to look for character sequences that would never appear in a regex.
There was only the one kind of group originally: what we now call capturing groups. The opening parenthesis was a metacharacter, so it would make no sense to follow it with a quantifier. You could match a literal open-paren zero or one time with \(?, or you could match (and capture) a literal question mark with (\?), but if you tried to use (? in regex it would throw an exception.
Larry changed the rule so (? could appear in a regex, but it must form the beginning of a special-group construct, which requires at least one more character. So, to answer your question, the string doesn't start with ?. The sequence (?: forms a single token, representing the beginning of a non-capturing group. We also have (?= and (?! for positive and negative lookaheads, (?<= and (?<! for lookbehinds, and so on.
(?:) is a non-capturing group. It do a matching operation only. It won't capture anything.
(?<!) is a Negative lookbehind.
I was making a RegEx using the regex101 tool and read in the explanation field
[.] - the literal character .
[\.] - matches the character . literally
I get lost between "literal character" and "character literally".
What is the difference between these two?
There is no difference. Sorry, I take that back. The only difference the words that Firas Dib, the author of regx101, chose to explain various tokens.
A literal character or matching something literally refers to specifying an actual character in the text: for instance, a to match a, as opposed to a character class such as \w that could also match a.
You can match a literal period in either of these three ways:
\.
[.]
[\.]
Which Option is Better?
Some people like option 2 because it makes it clear you are matching a period, not the catch-all dot. It stands out. For myself, I use \.. Some people will say that using a character class is less optimal, but on modern processors it makes no difference. You pick.
Option 3 is over the top and is typically used when someone doesn't know that periods don't need to be escaped inside a character class. In my view it's confusing. What did the author mean? Were they trying to create a character class to match either a backslash or a period, and made a typo? (That would be [\\.]