Why are there two matches with .* regex? [duplicate] - regex

This question already has an answer here:
Why does my Regex.Replace string contain the replacement value twice?
(1 answer)
Closed 5 years ago.
Why does the following generate two matches and therefore "xx" as the output:
"Hello" -Replace '.*','x'
Whereas this just generates one match and therefore just "x" in the output:
"Hello" -Replace '^.*','x'
I'm trying to understand what nuance of regex cause two matches in the first?
You can put the same into https://regex101.com and it also reports two matches with the first match being "Hello" and the second match being ""

That's because the * quantifier matches zero or more characters. In that case, it matches the entire word, Hello, then an empty string after it.
Use .+, and it will match at least one character instead.
When you use the ^.*, which looks at the beginning of the string, it only has one match, because it can't match an empty string there, as there is an H character in the starting.

Related

Regex if character matches then, else [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo

Replace special characters with "just one" underscore if in sequence [duplicate]

This question already has answers here:
Replace multiple characters by one character with regex
(3 answers)
Regex to match one or more characters
(2 answers)
Closed 2 years ago.
I have a string where there can be spaces and special characters, how do i replace the spaces and special characters with only one underscore if they are in sequence.
I have tried gsub(/[\W]/, '_') but this replaces each special character with underscore.
Example string: "This is a sample string & example"
Current output: "This_is_a_sample_string___example"
Expected output: "This_is_a_sample_string_example"
Any help on how to fix this would be really great. Thanks.
Use /\W+/ to Match Sequential Non-Word Characters
Use the \W metacharacter with the + quantifier to match one or more sequential non-word characters. The String#gsub replacement text will only be used once for each whole match, not for each character in the match. For example:
'This is a sample string & example'.gsub /\W+/, '_'
#=> "This_is_a_sample_string_example"
There are certainly other ways to do this, but this solution fits your posted use case.

Regular expression - Complete match [duplicate]

This question already has answers here:
RegEx to match full string
(4 answers)
What do ^ and $ mean in a regular expression?
(2 answers)
Closed 4 years ago.
r'^a$' is used as complete match.
Above pattern says... a string should start with letter a and end with letter a.
What stops this pattern(r'^a$') to match string 'anna'?
a string should start with letter a and end with letter a
That's not the only thing the regex says: it also requires the string to have no other characters in between the initial and final letter, meaning that the only string matched by this expression is a single-character string a.
In order to fix this, add .*? to match "the middle" of the string:
^a.*?a$
Note that this expression no longer matches a single-character string a, requiring at least two as to be there.
Demo
You're not interpreting it correctly.
A regular expression is processed left-to-right, matching parts of the input as it goes along.
^a$
means that the match starts at the beginning of the string, then has to match a right after, then has to match the end of the string immediately after that.
It's no different from
abc
meaning that b has to follow a immediately, and c has to follow b immediately.
You're interpreting the meaning of the regular expression wrong.
r'^a$' says a string that starts with letter "a" and ends with that same letter "a". That "a" character that is in the expression must be both the starting and ending characters in the string.
To extract strings that start and end with DIFFERENT a's, you can use r^a.*a$. But this requires that the two a's be different. To get any string that starts with "a" and ends with "a", you can OR these two together:
r'^a$|^a.*a$'

Regex negated character disjunction [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Very quick and simple question.
Consider the vector of character strings ("AvAv", "AvAvAv")
Why does the pattern (Av)\1([^A]|$) match both strings?
The pattern says have an isntance of "Av", have another, then either have a character that is not an "A" or else come to an end. The first string clearly matches, the latter I do not see how it does. It has two copies of "Av" but then it fails to end (missing the second disjunct), and fails to be followed by a charavter other than "A" (missing the first disjunct), so how does the pattern successfully match it?
Thank you so much for your time and assistance. It is greatly appreciated.
Here is an explanation:
AvAv - matches (Av)\1$
In this case, we can match Av, followed by that captured quantity, followed by $ from the alternation. In the case of AvAvAv we also have a match:
AvAvAv - again matches (Av)\1$
^^^^ last four letters match
It is the same logic here, except that in order to match, we have to skip the first Av.
If the pattern were ^(Av)\1([^A]|$) then only AvAv would be a match.
A RegEx only needs to match a part of the string to be considered "a match".
In other words, your RegEx matches this part:
AvAvAv
for the second example.
If you don't want it to match the second one, use a caret ^
^(Av)\1([^A]|$)
In this way the second one won't be matched.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101