RegEx for removing everything before and after a delimiter - regex

I am trying to remove everything before and after two | delimiters using regex.
An example being:
EM|CX-001|Test Campaign Name
and grabbing everything except CX-001. I cannot use a substring as the number of characters before and after the pipes may change.
I tried using the regex (?<=\|)(.*?)(?=\-), but while this selects CX-001, I need to select everything else but this.
How do I solve this problem?

You can try the following regular expression:
(^[^|]*\|)|(\|[^|]*$)
String input = "EM|CX-001|Test Campaign Name";
System.out.println(
input.replaceAll("(^[^|]*\\|)|(\\|[^|]*$)", "")
); // prints "CX-001"
Explanation of the regular expression:
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of \2

If you have only 2 pipes in you string, you could either match upon the first pipe or match from the last one until the end of the string:
^.*?\||\|.*$
Explanation
^.*?\| Match from start of string non greedy until the first pipe
| Or
\|.*$ Match from last pipe until end of string
Regex demo
Or you might also use a negated character class [^|]* without the need of capturing groups:
^[^|]*\||\|[^|]*$
Regex demo
Note
In your pattern (?<=\|)(.*?)(?=\-) I think you meant that the last positive lookahead should be (?=\|) instead of the - if you want to select between 2 pipes.

Find: ^[^|]*\|([^|]+).+$
Replace: $1

Related

Capture last occurrence from multiple occurrences in Regex pattern

How can I capture the below desired capture? I did this way Regex ONE.*(ONE.) but it captures the whole string.
Notedpad++:
1 ONE;TWO;THREE;ONE;FOUR;FIVE
2 TEST
3 TEST
4 TEST
5 TEST
Desired Capture: If ONE has 1 match then return ONE;TWO;THREE else if ONE has two matches then return ONE;FOUR;FIVE.
You can use
^.*\K\bONE\b.*
The pattern matches:
^ Start of string
.* Match any char 0+ times
\K\bONE\b Forget what is matched so far, and backtrack till the last occurrence of ONE to match it
.* Match the rest of the line
Regex demo
In Toad SQL, use
SELECT REGEXP_SUBSTR(Column, '.*(ONE.*)', 1, 1, NULL, 1)
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
In Notepad++, use
.*\KONE(?:(?!ONE).)*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\K matc reset operator
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------
You can also use (?:ONE.*)?(ONE.*) and retrieve your result from the first capturing group.
This regex will always try to match two ONEs in a line, but lets you access the part relevant to the second ONE. When there's only one that's the only part that matches.
You can try it here.

Regex - characters after delimiter, limited to a number

I am trying to put together some regex to get only the first 16 characters after the :
blahblahblah:fakeblahfakeblahfakeblahfakeblah
I came up with
/[^:]*$
but that matches everything after the colon and if I try to trim from there its actually starting at the last character.
Use
(?<=:)[^:]{16}(?=[^:]*$)
See proof
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
[^:]{16} any character except: ':' (16 times)
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
You might also use a capturing group, first matching until the last occurrence of : and then capture in group 1 matching 16 characters other than :
^.*:([^:]{16})
Explanation
^ Start of string
.*: Match the last occurrence of :
([^:]{16}) Capture group 1, match 16 chars other than : using the negated character class
Regex demo

Get first match in closing part of regex

I must take string with regex who got string "[%" "%]" and any text or "" inside this. As example:
Input: dsafsdfadsaffsdadsaffadsaf[%sadsad[%]%%]fdfsadfsad%]fsasdf
Output: [%sadsad[%]
I already wrote expression - \[%(.\n*)*%\], but it takes last of %].
Output: [%sadsad[%]%%]fdfsadfsad%]
Did anyone know how get first of closing match?
Put . and \n inside a capturing or non-capturing group delimited by a logical OR | operator, and make it as non-greedy.
\[%(.|\n)*?%\]
OR
You could do like the below.
\[%[\S\s]*?%\]
[\S\s]*? Matches any space or non-space character non-greedily.
\[%[^\]]*%\]
You can try this to get string upto first closng %].See demo.
https://regex101.com/r/gX5qF3/5
NODE EXPLANATION
--------------------------------------------------------------------------------
\[% '['%
--------------------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
%\] %']'

Perl regular expression explanation

I have regular expression like this:
s/<(?:[^>'"]|(['"]).?\1)*>//gs
and I don't know what exactly does it mean.
The regex looks intended to remove HTML tags from input.
It matches text beginning with < and ending with >, containing non->/non-quotes or quoted strings (which may contain >). But it appears to have an error:
The .? says that quotes may contain 0 or 1 character; it was probably intended to be .*? (0 or more characters). And to prevent backtracking from doing things like making the . match a quote in some odd cases, it needs to change the (?: ... ) grouping to be possessive (> instead of :).
This tool can explain the details: http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%22]|%28[%27%22]%29.%3F\1%29*%3E
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>'"] any character except: '>', ''', '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
['"] any character of: ''', '"'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.? any character except \n (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
> '>'
So it tries to remove HTML tags as ysth also mentions.

Trying to match what is before /../ but after / with regular expressions

I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /
I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:
this/is/a/./path/that/../includes/face/./stuff/../hat
and my regular expression is:
#\/(.*)\.\.\/#
matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../
How should I change my regex to make it work?
.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.
Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)
[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.
Change your pattern that only characters other than / ([^/]) get matched:
#([^/]*)/\.\./#
Alternatively, you can use a lookahead.
#(\w+)(?=/\.\./)#
Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
) end of look-ahead
I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#
In your favourite language, do a few splits and string manipulation eg Python
>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1] # the last item is not required.
>>> for item in a:
... print item.split("/")[-1]
...
that
stuff
In python:
>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']
Or if you just want the text without the slashes:
>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']
([^/]+) will capture all the text between slashes.
([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.
You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'