This question already has an answer here:
Finding the indexes of multiple/overlapping matching substrings
(1 answer)
Closed 7 years ago.
I'm trying to find all matches of a particular pattern "8ab|ab8" in the string "8ab8". So I tried the R command gregexpr("8ab|ab8","8ab8") hoping to get a return vector with the starting positions as c(1,2).
Unfortunately, it seems that what happens is that once the first pattern is matched, that portion of the string is "removed" and the second pattern won't be matched.
For example, once "8ab" is matched, "8ab8" becomes "8" and when R tries matching "ab8" in "8", the pattern won't be found. I know this because gregexpr("8ab|ab8","8ab ab8") works fine and returns starting positions of pattern matches as c(1,5).
The question is, how do I match the same pattern multiple times in the first case?
Use perl regular expressions: perl=TRUE . (see ?regex for info on perl regular expressions)
gregexpr("(?=8ab)|(?=ab8)","8ab8",perl=T)
Related
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 2 years ago.
I would like to come out with a regex expression that negate the matched results of regex expression: .google.*search. And, is it possible to achieve it with regex from the regex expression I am trying to negate?
Test data
[1] https://www.google.com/search?newwindow=1&sxsrf=ALeKk02MzEfbUp3jO4Np
[2] https://github.com/redis/redis-rb
[3] https://web.whatsapp.com/
Expected result
Row 2, 3 match the regex pattern and are part of the results.
the following regex does the trick
^(?!.+google.*search)
basically matching the beginning of the line then negating (?!) (negative lookahead) your regex.
You may use a negative lookahead here:
https?:\/\/(?!.*\.google\..*search).*
Demo
The "secret sauce" here is (?!.*\.google\..*search), which asserts that .google. followed by search does not occur anywhere within the URL to the right of the https:// portion.
This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.
This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 3 years ago.
I'm new to Perl and is working with regular expressions. I am not able to decide how Perl resolves the ambiguity for a regex match when multiple matches are possible for a given query string. For example
('hellohellohello' =~ m/h.*o/)
This could match 'hello', 'hellohello' or 'hellohellohello'. Which one will it choose - shortest or largest match ? What if we want opposite behavior (like if default is to find the shortest match then finding the largest match) ?
In case the answer to the first is largest consider
('hello
hellohello' =~ m/h.*o/)
Here, it could match from the first line (before the newline character) or the second line (after the newline character) - first vs largest match. Which one will it use ?
What are the complete set of rules that can be used to decide which substring of a string would match a given regex (might be some case other than the one mentioned in the examples where multiple matches could be found) ?
* is greedy, so it tries to match the longest possible string, so long as the rest of the pattern can still be matched. So it will match hellohellohello.
If you use *? instead, that makes it non-greedy, and it will match the shortest possible string, again as long as the rest of the pattern matches. So m/h.*?o/ will match hello.
This question already has an answer here:
Why does my Regex.Replace string contain the replacement value twice?
(1 answer)
Closed 5 years ago.
Why does the following generate two matches and therefore "xx" as the output:
"Hello" -Replace '.*','x'
Whereas this just generates one match and therefore just "x" in the output:
"Hello" -Replace '^.*','x'
I'm trying to understand what nuance of regex cause two matches in the first?
You can put the same into https://regex101.com and it also reports two matches with the first match being "Hello" and the second match being ""
That's because the * quantifier matches zero or more characters. In that case, it matches the entire word, Hello, then an empty string after it.
Use .+, and it will match at least one character instead.
When you use the ^.*, which looks at the beginning of the string, it only has one match, because it can't match an empty string there, as there is an H character in the starting.
This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 8 years ago.
I am new to perl (beginner, learning perl for past 1 week during spare time). This is my first programming language.
I want to know how this regex []+ works in perl. I have 3 questions.
What will this do: if /[\d\s\.,:\/]+/?
I learned if /.../ matches pattern.
So will it match the following?
And which parts of the following will not match?
335.31, 312.52
Dave1.532
Path: "./1243/453 /48.1"
543, 546
Edit:
This is not a duplicate of the linked question as I am specifically asking how []+ works. The answer in the linked post does not cover this.
I know what each character in the regex I have written above represents and how each character work. What I want to know is how []+ will influence the regular expression. Specifically how the + will influence the [].
I suggest you use regex101.com to try the regular expression. Below is the breakdown for the expression you provided:
`[]` match a single character present in the list
`+` matches one or more of the above
`\d` match a digit [0-9]
`\s` match any white space character `[\r\n\t\f ]`
`\.` matches the character . literally
`,:` a single character in the list ,: literally
`\/` matches the character / literally
You'll get the following matches (if you run this with g- global option) - (REGEX sample - ref):
`335.31, 312.52 `
`1.532 `
`: `
`./1243/453 /48.1`
` 543, 546 `