I have string like this:
\24s904dS\24sr4d2\24x\\y\\12z:234F\\3dRl\24o980\24
I want to match the bold part only:
x\\y\\12z:234F\\3dRl
I can take care the non-greedy for right part with this regex:
\\24(.*:.*?)\\24
But still can't find out how to deal with non-greedy for left part.
modify your pattern as follows
.*\\24(.*:.*?)\\24
You can use this negative lookahead based regex:
\\24((?:.(?!\\24))*:.*?)\\24
RegEx Demo
Important part is this lookahead based regex pattern (?:.(?!\\24))*, which means match a character if \24 is not followed. That essentially makes sure most adjacent left \24 is matched.
Output Match:
x\\y\\12z:234F\\3dRl
Rather than modifying the greediness, it's better to just write a more-precise regex:
\\24([a-zA-Z0-9]+:[a-zA-Z0-9]+)\\24
(It's relatively rare that non-greedy modifiers are really the best approach to a problem.)
Related
I am trying to create a regex to find the following string:
AGK-XL.
Sometimes before and after this string there are other characters that are usually harmless, except if there is the following pattern before the string:
NOT-
I need to delete/ignore those cases.
This is what I have tried:
^[^N][^O][^T][^\-]AGK-XL\.(\s|\W|$)
But it only seems to match when there are exactly 4 letters in front of the string. How can I express that any other pattern besides NOT- before AGK-XL. is harmless?
Thanks for any hints.
edit: I am using regex in VBA atm.
If you cannot use fancy look-behinds, you can rely on capturing mechanism when you need to match something we do not want, and match and capture what you want. See the The Best Regex Trick Ever at rexegg.com.
However, in this case, you can match and capture NOT-AGK-XL. (so that you can restore it later with $1 backreference), and only match all other occurrences of AGK-XL. that you will remove. Use alternation operator | to match both alternatives:
(NOT-AGK-XL\.(?!\w))|AGK-XL\.(?!\w)
See demo
Note I replaced (\s|\W|$) with (?!\w) that is - IMHO - a better word boundary check.
Im sorry if this is asked and has an answer but I can't find it.
I know about regex lookarounds and negative lookahead.
Thing is that negative lookahead examines what comes right after current position in a string.
What I need is to find and discard matches if string contains words like "career(s)" and "specials" for example, but if it contains them anywhere in the string.
What would be the efficient way of doing that?
At the moment I'm using PCRE flavor but the more general regex is, the better.
Thank you.
You can use this regex:
^(?!.*(?:career\(s\)|specials)).*
Or if s is optional then use:
^(?!.*(?:career|special)s?).*
RegEx Demo
I'm trying to detect occurrences of words italicized with *asterisks* around it. However I want to ensure it's not within a link. So it should find "text" in here is some *text* but not within http://google.com/hereissome*text*intheurl.
My first instinct was to use look aheads, but it doesn't seem to work if I use a URL regex such as John Gruber's:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
And put it in a look ahead at the beginning of the pattern, followed by the rest of the pattern.
(?=URLPATTERN)\*[a-zA-Z\s]\*
So how would I do this?
You can use this alternation technique to match everything first on LHS that you want to discard. Then on RHS use captured group to match desired text.
https?:\/\/\S*|(\*\S+\*)
You can then use captured group #1 for your emphasized text.
RegEx Demo
The following regexp:
^(?!http://google.com/hereissome.*text.*intheurl).*
Matches everything but http://google.com/hereissome*text*intheurl. This is called negative lookahead. Some regexp libraries may not support it, python's does.
Here is a link to Mastering Lookahead and Lookbehind.
I am trying to write a regular expression matching a set without some chars.
For example, it matches [ a-zA-Z]* but excludes i,o,q,I,O,Q.
So: "A fat cat" matches, "Boy" doesn't.
Looks like it can be [ a-hj-npr-zA-HJ-NPR-Z]*.
Is there a simpler version for this?
Btw, I'm using it in PostgreSQL, but I think it should be a standard expression.
You can use negative lookahead for this as Postgresql support lookaheads:
(?![ioqIOQ])[A-Za-z ]
To make it match complete line use:
^(?:(?![ioqIOQ])[A-Za-z ])+$
RegEx Demo
Based on #Anubhava's answer, but extending to an entire string rather than just one character,
^(?=[^ioqIOQ]*$)[ A-Za-z]*$
The (?=...) is a positive lookahead -- the opposite of the negative lookahead in Anubhava's answer. We are requiring all matches to also match the constraint [^ioqIOQ].
You could also implement the repetition over the entire string with
^((?![ioqIOQ])[ A-Za-z])*$
but it seems a lot less efficient. (I have not performed any timings, though.)
Don't need fancy lookaheads/behinds just use more, but smaller, character ranges.
You'll want something like ^[a-hj-npr-zA-HJ-NPR-Z ]*$.
Added a space to match sentences
You can see test this on-line here at debuggex
I want to match against Strings such as AhKs & AdKs (i.e. two cards Ah = Ace of Hearts). I want to match two off-suit cards with a regex, what I currently have is "^[AKQJT2-9][hscd]{2}$", but this could match hands such as AhKh (suited) and AhAh. Is there a way to possibly use backreferences to say the second [hscd] cannot be the same as the firs (similarly for [AKQJT2-9])
Not perfectly elegant, but works:
^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$
Try this regular expression:
^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$
Here a negative look-ahead assertion (?!…) is used to disallow the fourth character to be the same as the second (match of first grouping).
But if the regular expression implementation does not support look-around assertions, you will probably need to expand it to this:
^[AKQJT2-9](h[AKQJT2-9][scd]|s[AKQJT2-9][hcd]|c[AKQJT2-9][hsd]|d[AKQJT2-9][hsc])$
a negative lookahead comes to the rescue
/^[AKQJT2-9]([hscd])[AKQJT2-9](?!\1)[hscd]$/
:( too late.
Yes. Use back-reference together with a negative look-ahead.
^([AKQJT2-9])([hscd])(?!\1)(?!.\2)[AKQJT2-9][hscd]$