This question already has an answer here:
Regex to match text after a given character excluding the character itself
(1 answer)
Closed 6 years ago.
In someXstring it's easy to find everything after and including 'X'.
What I need is to find everything after, but EXCLUDING 'X'.
... just to match string in it.
Try using a lookbehind assertion.
(?<=X)\w+
If you regex engine doesn't support lookbehind assertions, you can work around that using capturing groups.
X(\w+)
In the above regex, string would be accessed referencing \1.
NOTE: this uses \w to capture word characters. If you literally mean that you want to capture everything then use the dot, ., metacharacter instead...
(?<=X).+$
You can use lookbehind if available
(?<=X).*$
if not you can use groups.Grab group 1.
X(.*$)
Related
I created a regex expression to match any string surrounded by quotation marks in a log such as "example" but exclude the word heartbeat if found.
[^0-9A-Za-z_&-]("(?!heartbeat)[A-Za-z0-9_.&?=%~#{}()#+-:]*")[^A-Za-z0-9_-]
I verified the expression in regex101 but once in MobaXterm it does not work. My assumption is MobaXterm does not handle Negative Lookaheads.
Keep in mind the following does work:
[^0-9A-Za-z_&-]("[A-Za-z0-9_.&?=%~#{}()#+-:]*")[^A-Za-z0-9_-]
Is there an alternative to what I am trying to achieve?
You can make some hard to read regexes such as:
"(?:[^h].*|h[^e].*|he[^a].*|hea[^r].*|etc...)"
(replace .* with the second character class) but another option would be to write "heartbeat"|([A-Za-z0-9_.&?=%~#{}()#+-:]*). When the string is "heartbeat" this will skip the capture group but that only works if your program is specifically looking for the capture group.
You can also place \w and \d in your character classes to make them simpler: [^\d\w&-](?:"heartbeat"|("[\w\d.&?=%~#{}()#+-:]*"))[^\d\w-]
This question already has answers here:
How can I match overlapping strings with regex?
(6 answers)
Matching when an arbitrary pattern appears multiple times
(1 answer)
Closed 2 years ago.
I'm trying to find-and-replace instances where consecutive commas appear throughout a string; replacing them w/ something like ",N/A,". I was using a very simple /,,/g pattern, and that works on things like ",,abc" and ",,,,abc" (with even numbers of commas). However, it doesn't catch things like ",,,abc". That's because the first two commas are considered a match, and then the third comma is just considered part of a new ",abc" string. Is there a way to handle this w/ a RegEx pattern or options? Otherwise, I'm going to need to perform multiple searches.
FWIW - I'm working in JavaScript, but I'm guessing this is just a general RegEx question/answer.
The reason why /,,/g only matches once with three commas is because the global match restarts after the position of the final consumed characters. You need a way to match the pattern of ,, without consuming those characters for pattern matching purposes.
If your language supports it, use a positive lookahead. A positive lookeahead lets a regex match some additional characters, but not consume them in the pattern.
/,(?=,)/g
In English, this means:
, # match a comma, then
(?= #start a group that must exist, and if so, isn't consumed by the pattern,
, # a comma
)
See more about this here: https://www.regular-expressions.info/lookaround.html
Javascript supports positive lookahead. :)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
How to create a short regular expression which only matches words that don't have the same characters following after another.
It is only the following Syntax elements allowed to use:
. * + ? | ()
And the alphabet is as {a, b}
Example:
Is matching: abababab
Not matching: abbab
Thank you :)
Well, your exercise is not very clear (which regex engine are you using? etc),
but I managed to do something:
(?<=^|\P{L})(?:(\p{L})(?!\1))+(?=\P{L}|$)
https://regex101.com/r/R2t2ik/1
Explanation
We are looking for a character of any type of language and not just [a-z]
neither just the \w for a word character. This is because àéêï would
typically not match. So instead, use \p{L} which is made for selecting
specific Unicode classes.
More details here:
https://www.regular-expressions.info/unicode.html#category
We will capture this char with a capturing group: (\p{L})
This will create a match with the number 1. The match 0 is the match of the
entire regular expression. Each capturing expression found from left to right
will create a new numbered match. In our case we will then be able to refer
our captured group with the \1 reference.
To check if two following characters are not identical, we will use a
negative lookahead, meaning that the searched item will not be selected
if the lookahead results with a success.
The regex becomes: (\p{L})(?!\1)
This means: "Find a letter of any language that is not followed by itself."
Now, a word is made of one or more characters, so it could be matched with
\w+ but as explained before, this would only work in English. So in any
language, it would become (\p{L})+. It seems that \p{L}+ doesn't work
properly, so adding a group around it will help the + to know what should
appear once or more.
Okay, that's good, but it's not what we want exactly. We only want to find
characters that are not followed by themselves. So we have to use our
pattern at point 3.
This becomes: (?:(\p{L})(?!\1))+
You would ask why do we have this (?: and ) around all of it?
Well, this is because we could simply use ( and )+ but in this case it
would create a new capturing group, which we don't need. So to create a
non-capturing group, you have to add the ?: at the beginning.
Capturing group = (abc) vs non-capturing group = (?:abc)
To finish, we want to capture word beginnings and ends with the help of
a positive lookbehind and a positive lookahead. I started with the usual
\b for word boundary but it did not work. Don't ask me why. I expect
that it's related to the use of the Unicode classes or perhaps the way the
selector is written. Someone may find an explanation, I'm not a specialist.
Well, I had to solve that by trying to match either the begin of the string
with the ^ selector and with the \P{L} Unicode class to select a char
which is not a language character. I did the same for the end by using the
$ selector.
So at the beginning, I added a positive lookbehind meaning "start with or
has a non-letter char before" done with this (?<=^|\P{L}) rule.
And at the end, I added a positive lookahead meaning "finish with or has
a non-letter char after" done with this (?=\P{L}|$) rule.
Putting everything together:
(?<=^|\P{L})5 + (?:(\p{L})(?!\1))+4 +
(?<=^|\P{L})5 results in:
(?<=^|\P{L})(?:(\p{L})(?!\1))+(?=\P{L}|$)
I hope it's what you where looking for and that it's not to complicated to
understand.
This question already has answers here:
Does regex lookahead affect subsequent match?
(2 answers)
Closed 4 years ago.
I have currently a pattern match in a query like this
if(upper(email_omni_code_mini) like '%TRAVEL%' and upper(email_omni_code_mini) NOT like '%TRAVEL%ENS%',...,...)
I want to change this to a single pattern match but this won't work
TRAVEL(?!ENS) as ENS is not immediately following.
Is there a way to solve this easily.
Any help is appreciated.
If there are other chars in between, insert .* before ENS:
TRAVEL(?!.*ENS)
It will now match TRAVEL that is not immediately followed with any 0+ chars as many as possible followed with ENS substring.
See the regex demo.
This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 6 years ago.
for one of my classes I have to describe the following regular expression:
\b4[0-9]{12}(?:[0-9]{3})\b
I understand that it selects a number that: begins with 4, is followed by 12 digits (each between 0-9), and is followed by another 3 digits.
What I don't understand is the the question mark with the semicolon (?:....). I've tried looking online to find out what this means but the links I've found were somewhat confusing; I was hoping someone could give me a quick basic idea of what the question mark does in this example.
This is going to be short answer.
When you use (?:) it means that the group is matched but is not captured for back-referencing i.e non-capturing group. It's not stored in memory to be referenced later on.
For example:
(34)5\1
This regex means that you are looking for 34 followed by 5 and then again 34. Definitely you could write it as 34534 but sometimes the captured group is a complex pattern which you could not predict before hand.
So whatever is matched by capturing group should be appearing again.
Regex101 demo for back-referencing
Back-referencing is also used while replacement.
For Example:
([A-Z]+)[0-9]+
This regex will look for many upper case letters followed by many digits. And I wish to replace this whole pattern just by found upper case letters.
Then I would replace whole pattern by using \1 which stands for back-referencing first captured group.
Regex101 demo for replacement
If you change to (?:[A-Z]+)[0-9]+ this will no longer capture it and hence cannot be referenced back.
Regex101 demo for non-capturing group
A live answer.
It's called a 'non-capturing group', which means the regex would not make a group by the match inside the parenteses like it would otherwise do (normally, a parenthesis creates a group).