This question already has answers here:
RegEx to match full string
(4 answers)
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 4 years ago.
r'^a$' is used as complete match.
Above pattern says... a string should start with letter a and end with letter a.
What stops this pattern(r'^a$') to match string 'anna'?
a string should start with letter a and end with letter a
That's not the only thing the regex says: it also requires the string to have no other characters in between the initial and final letter, meaning that the only string matched by this expression is a single-character string a.
In order to fix this, add .*? to match "the middle" of the string:
^a.*?a$
Note that this expression no longer matches a single-character string a, requiring at least two as to be there.
Demo
You're not interpreting it correctly.
A regular expression is processed left-to-right, matching parts of the input as it goes along.
^a$
means that the match starts at the beginning of the string, then has to match a right after, then has to match the end of the string immediately after that.
It's no different from
abc
meaning that b has to follow a immediately, and c has to follow b immediately.
You're interpreting the meaning of the regular expression wrong.
r'^a$' says a string that starts with letter "a" and ends with that same letter "a". That "a" character that is in the expression must be both the starting and ending characters in the string.
To extract strings that start and end with DIFFERENT a's, you can use r^a.*a$. But this requires that the two a's be different. To get any string that starts with "a" and ends with "a", you can OR these two together:
r'^a$|^a.*a$'
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I'm currently using this website to create some regular expressions for a programming language I want to build, at the moment I'm just setting up an expression for identifiers.
In my language, identifiers are expressed like most languages:
They cannot begin with a digit, or special character other than an underscore
After the first character they can contain alphanumeric and underscore characters
Given those rules I've come up with the following expression by myself:
^\D\w+$
Obviously, it doesn't account for special characters, however the following expression does (which I didn't make myself):
^(?!\d)\w+$
Why does the second expression account special characters? Shouldn't they be producing the same results?
I will explain why the second regex works.
The second regex uses a lookahead. After matching the start of the string, the engine checks whether the next character is a digit but it does not match it! This is important because if the next character is not a digit, it tries to use \w to match that same character, which it couldn't if the character is a symbol, if it is a digit, the negative lookahead fails and nothing is matched.
\D on the other hand, will match the character if it is not a digit, and \w will match whatever comes after that. That means all symbols are accepted.
This ^(?!\d)\w+$ means a string consisted of word characters [a-zA-Z0-9_] that doesn't start with a digit.
This ^\D\w+$ means a non-digit character followed by at least one character from [a-zA-Z0-9_] set.
So #ab01 is matched by second regex while first regex rejects it.
(?!\d)\w+ means "match a word which is not prepended with digits". But as you're wrapping it with ^ and $ characters it is basically the same as just ^\w+$ which is obviously not the same as ^\D\w+$. ^(?!\d).+\w+$ (note ".+" in the middle) would behave the same as ^\D\w+$
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Very quick and simple question.
Consider the vector of character strings ("AvAv", "AvAvAv")
Why does the pattern (Av)\1([^A]|$) match both strings?
The pattern says have an isntance of "Av", have another, then either have a character that is not an "A" or else come to an end. The first string clearly matches, the latter I do not see how it does. It has two copies of "Av" but then it fails to end (missing the second disjunct), and fails to be followed by a charavter other than "A" (missing the first disjunct), so how does the pattern successfully match it?
Thank you so much for your time and assistance. It is greatly appreciated.
Here is an explanation:
AvAv - matches (Av)\1$
In this case, we can match Av, followed by that captured quantity, followed by $ from the alternation. In the case of AvAvAv we also have a match:
AvAvAv - again matches (Av)\1$
^^^^ last four letters match
It is the same logic here, except that in order to match, we have to skip the first Av.
If the pattern were ^(Av)\1([^A]|$) then only AvAv would be a match.
A RegEx only needs to match a part of the string to be considered "a match".
In other words, your RegEx matches this part:
AvAvAv
for the second example.
If you don't want it to match the second one, use a caret ^
^(Av)\1([^A]|$)
In this way the second one won't be matched.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What is the meaning of this regular expression?
['`?!\"-/]
Why it matches parenthesis?
I used Java for development
In your regex
['`?!\"-/]
The quantity "-/ is being interpreted as a range of values, just as A-Z would mean taking every letter between A and Z. It turns out, by reading the basic ASCII table, that parentheses lie within this range, so your pattern is including them.
One trick you can use here with dash is to place it at the end:
['`?!\"/-]
^^^^ this will not be interpreted as a range
Because you didn't escape the dash -. The dash, inside a character class [] denotes a range of characters. In this case from " to /. And parentheses are between those, in ASCII.
The dash needs to be escaped \-, if it's not the first or last character, inside a character class, when you want it to be matched as a literal.
You have to use following
You need to escape -, otherwise, parentheses are matching.
Seems like "-/ will include parentheses as well. Like [A-C], which matches ASCII chars between A to C
[\'`?!\"\-/]
It will match following characters in a string.
'`?"-/
Check in the regex101
This question already has an answer here:
Why does my Regex.Replace string contain the replacement value twice?
(1 answer)
Closed 5 years ago.
Why does the following generate two matches and therefore "xx" as the output:
"Hello" -Replace '.*','x'
Whereas this just generates one match and therefore just "x" in the output:
"Hello" -Replace '^.*','x'
I'm trying to understand what nuance of regex cause two matches in the first?
You can put the same into https://regex101.com and it also reports two matches with the first match being "Hello" and the second match being ""
That's because the * quantifier matches zero or more characters. In that case, it matches the entire word, Hello, then an empty string after it.
Use .+, and it will match at least one character instead.
When you use the ^.*, which looks at the beginning of the string, it only has one match, because it can't match an empty string there, as there is an H character in the starting.
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 6 years ago.
I was wondering how to match a line not containing a specific word using Python-style Regex (Just use Regex, not involve Python functions)?
Example:
PART ONE OVERVIEW 1
Chapter 1 Introduction 3
I want to match lines that do not contain the word "PART"?
This should work:
/^((?!PART).)*$/
Edit (by request): How this works
The (?!...) syntax is a negative lookahead, which I've always found tough to explain. Basically, it means "whatever follows this point must not match the regular expression /PART/." The site I've linked explains this far better than I can, but I'll try to break this down:
^ #Start matching from the beginning of the string.
(?!PART) #This position must not be followed by the string "PART".
. #Matches any character except line breaks (it will include those in single-line mode).
$ #Match all the way until the end of the string.
The ((?!xxx).)* idiom is probably hardest to understand. As we saw, (?!PART) looks at the string ahead and says that whatever comes next can't match the subpattern /PART/. So what we're doing with ((?!xxx).)* is going through the string letter by letter and applying the rule to all of them. Each character can be anything, but if you take that character and the next few characters after it, you'd better not get the word PART.
The ^ and $ anchors are there to demand that the rule be applied to the entire string, from beginning to end. Without those anchors, any piece of the string that didn't begin with PART would be a match. Even PART itself would have matches in it, because (for example) the letter A isn't followed by the exact string PART.
Since we do have ^ and $, if PART were anywhere in the string, one of the characters would match (?=PART). and the overall match would fail. Hope that's clear enough to be helpful.