I'm doing a non-greedy match like this
'(?<C2>.+?)'
to find a group inside a quotes. This works well, until I want to do something like this
'(?<C2>.+?)' as
to match something in quotes followed by a space, following by the word as.
But now, the following will not match as desired
'hello'123'hello2' as
I want this to not match at all...but it ends up matching the whole chunk
'hello'123'hello2'
as C2
What's the best way to force the non-greedy .+? to include up to the first occurance of a ', not the first occurance of ' as
This seems to work
(?<C2>'[^']+')(?= as)
Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<C2> group and capture to C2:
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^']+ any character except: ''' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
as ' as'
--------------------------------------------------------------------------------
) end of look-ahead
Even without the lookahead (?= as), (?<C2>'[^']+') will match quoted strings in a non-greedy way as expected.
You can try;
'(?<C2>[^']+?)' as
I think I understood your question differently than those who have replied so far. By
What's the best way to force the non-greedy .+? to include up to the first occurance of a ', not the first occurance of ' as
did you mean to say you wanted to match the word between the first two ', i.e. hello, not hello2? In that case, this is my suggestion:
'(?<C2>.+?)'(?! as)
The negative lookahead will ensure that you will not match the word which comes before as.
In case I misunderstood your request: sorry.
Related
I am new to regex, so any help is really appreciated.
I have an expression to identify a URL :
(http[^'\"]+)
Unfortunately on some URLs, I get additional square brackets at the end
For instance "http://example.com]]"
As the result want to receive "http://example.com"
How do I get rid of those brackets with the help of the regex I wrote above?
What you actually have is called a negated character class, so just add characters that should not be matched. In addition, there's not really a need for a capturing group. That said, you could use
http[^'"\]\[]+
# ^^^^
Note that this will exclude square brackets anywhere in your possible url not just at the end. See a demo on regex101.com.
Stop the match between a word and nonword character:
(http[^'"]+)\b
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
[^'"]+ any character except: ''', '"' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
I am trying to write a regex expression in PCRE which captures the first part of a word and excludes the second portion. The first portion needs to accommodate different values depending upon where the transaction is initiated from. Here is an example:
Raw Text:
.controller.CustomerDemographicsController
Regex Pattern Attempted:
\.controller\.(?P<Controller>\w+)
Results trying to achieve (in bold is the only content I want to save in the named capture group):
.controller.CustomerDemographicsController
NOTE: I've attempted to exclude using ^, lookback, and lookforward.
Any help is greatly appreciated.
You can match word chars in the Controller group up to the last uppercase letter:
\.controller\.(?P<Controller>\w+)(?=\p{Lu})
See the regex demo. Details:
\.controller\. - a .controller\. string
(?P<Controller>\w+) - Named capturing group "Controller": one or more word chars as many as possible
(?=\p{Lu}) - the next char must be an uppercase letter.
Note that (?=\p{Lu}) makes the \w+ stop before the last uppercase letter because the \w+ pattern is greedy due to the + quantifier.
Also, use
\.controller\.(?P<Controller>[A-Za-z]+)[A-Z]
See proof.
EXPLANATION:
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
controller 'controller'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
(?P<Controller> group and capture to Controller:
--------------------------------------------------------------------------------
[A-Za-z]+ any character of: 'A' to 'Z', 'a' to 'z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of Controller group
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
If I want to get the Name between “for” and “;” which is NISHER HOSE, can you help me find the correct regex expression as there is more than one "for’ and “;” in the string
Data Owner Approval Needed for Access Request #: 2137352 for NISHER HOSE; CONTRACTOR; Manager: MUILLER, TIM (TWM0069)
Using the regular expression (?<=for).*(?=;) I get the wrong match Access Request #: 2137352 for NISHER HOSE; CONTRACTOR - see screenshot on https://www.regextester.com/
Thanks
If you only want to assert for on the left, you should and make sure to not match for again and you should exclude matching a ; while asserting there is one at the right.
(?<=\bfor )(?:(?!\bfor\b)[^;])+(?=;)
Explanation
(?<=\bfor ) Assert for at the left
(?:(?!\bfor\b)[^;])! Match 1+ times any char except ; if from the current position not directly followed by for surrounded by word boundaries
(?=;) Assert ; directly at the right
Regex demo
Use
(?<=\bfor )(?![^;]*\bfor\b)[^;]+
See proof.
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
for 'for '
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[^;]* any character except: ';' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
for 'for'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^;]+ any character except: ';' (1 or more times
(matching the most amount possible))
The main issue here is that there are two "for". If you want to catch the name then use the ":" as a delimiter to catch the second "for":
Regex: /:.*for(.+?);/gm
Demo: https://regex101.com/r/p3QY0o/1
The name will be captured in group 1. If you decide to use a lookahead/lookbehind just bear in mind that these may or may not be supported depending on the regex engine.
I want to match for any string of characters between two words ("Hello" and "Goodbye" in the following examples) using a regex.
The bolded areas in the following list should match:
Hello, I like you. Goodbye.
Hello there, do you enjoy golf?
I like you. Goodbye. See you later.
Examples of strings that should not match at all include (basically I want to treat the words "Hello" and "Goodbye" as a kind of barrier):
HelloGoodbye
Goodbye, how are you?
How are you? Hello
I tried using (?<=Hello).*(?=Goodbye), which works in some cases (see here). The issue with this regex is that if for example "Goodbye" isn't present, none of the text after "Hello" matches (and vice versa).
I'm not exactly sure that the regex I have tried is a good way to go about it. Possibly, I just need to match any part of the string that follows "Hello" and/or preceeds "Goodbye" (but neither need to be present for a match).
I believe I need to have some kind of conditional, and I guess matching the first two is easy but I am unable to find a way to do it.
Any help would be appreciated as I am still new to using regular expressions.
Use
(?<=Hello|^)(?:(?!Hello|Goodbye).)+(?=Goodbye|$)
See proof
EXPLANATION
EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
Hello 'Hello'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
Hello 'Hello'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
Goodbye 'Goodbye'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
Goodbye 'Goodbye'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
As an alternative, not quite as sophisticated as the accepted answer and matching differently in case of repetitive boundary words "Hello" and "Goodbye", but maybe a bit easier to understand because it just uses a lazy/reluctant quantifier *? for the match and does not resort to look-behind or look-ahead:
^(?:.*Hello)?(.*?)(?:Goodbye.*)?$
The non-capturing groups starting with (?: make sure that group 1 matches what you need. If you do not mind using group 2, you do not need to use non-capturing groups at all. Keep it simple! Then the regex would read:
^(.*Hello)?(.*?)(Goodbye.*)?$
You can test the first regex here.
See also this regex quantifier cheat sheet.
I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /
I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:
this/is/a/./path/that/../includes/face/./stuff/../hat
and my regular expression is:
#\/(.*)\.\.\/#
matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../
How should I change my regex to make it work?
.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.
Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)
[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.
Change your pattern that only characters other than / ([^/]) get matched:
#([^/]*)/\.\./#
Alternatively, you can use a lookahead.
#(\w+)(?=/\.\./)#
Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
) end of look-ahead
I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#
In your favourite language, do a few splits and string manipulation eg Python
>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1] # the last item is not required.
>>> for item in a:
... print item.split("/")[-1]
...
that
stuff
In python:
>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']
Or if you just want the text without the slashes:
>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']
([^/]+) will capture all the text between slashes.
([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.
You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'