Regex - characters after delimiter, limited to a number - regex

I am trying to put together some regex to get only the first 16 characters after the :
blahblahblah:fakeblahfakeblahfakeblahfakeblah
I came up with
/[^:]*$
but that matches everything after the colon and if I try to trim from there its actually starting at the last character.

Use
(?<=:)[^:]{16}(?=[^:]*$)
See proof
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
[^:]{16} any character except: ':' (16 times)
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead

You might also use a capturing group, first matching until the last occurrence of : and then capture in group 1 matching 16 characters other than :
^.*:([^:]{16})
Explanation
^ Start of string
.*: Match the last occurrence of :
([^:]{16}) Capture group 1, match 16 chars other than : using the negated character class
Regex demo

Related

Regex to find 5 integers before last underscore of filename

I need to find 5 integers before the last underscore in a given filename.
Example string:
X130874_W907025343_Txt.pdf
I need to find 25353
The closest I came was (?<=_)[^_]+(?=[^_](.{5})_)
Use a lookahead after 5 digits that matches an underscore followed by no undercores until the end.
\d{5}(?=_[^_]*$)
Use
[0-9]{5}(?=_(?!.*_))
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[0-9]{5} any character of: '0' to '9' (5 times)
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead

Capture last occurrence from multiple occurrences in Regex pattern

How can I capture the below desired capture? I did this way Regex ONE.*(ONE.) but it captures the whole string.
Notedpad++:
1 ONE;TWO;THREE;ONE;FOUR;FIVE
2 TEST
3 TEST
4 TEST
5 TEST
Desired Capture: If ONE has 1 match then return ONE;TWO;THREE else if ONE has two matches then return ONE;FOUR;FIVE.
You can use
^.*\K\bONE\b.*
The pattern matches:
^ Start of string
.* Match any char 0+ times
\K\bONE\b Forget what is matched so far, and backtrack till the last occurrence of ONE to match it
.* Match the rest of the line
Regex demo
In Toad SQL, use
SELECT REGEXP_SUBSTR(Column, '.*(ONE.*)', 1, 1, NULL, 1)
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
In Notepad++, use
.*\KONE(?:(?!ONE).)*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\K matc reset operator
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
ONE 'ONE'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------
You can also use (?:ONE.*)?(ONE.*) and retrieve your result from the first capturing group.
This regex will always try to match two ONEs in a line, but lets you access the part relevant to the second ONE. When there's only one that's the only part that matches.
You can try it here.

How to match strings not containing any word characters between a minus sign and numbers in PL/SQL regexp

I have some strings in Oracle where there is a minus sign (not at the beginning but inside the string), followed by a number (int or decimal with dot or comma).
I would like to find these in PLSQL. I have this already, and it's almost perfect:
REGEXP_LIKE(string, '-\d+(,|\.)*\d*')
I was hoping that it's finding strictly strings like somestring-11,1 but the problem is, it finds also strings like somestring-11a1,1 so where there is eventually a non numeric (or word) character between the minus and the numbers. I was trying to use negative lookahead, but unfortunately it's not working:
REGEXP_LIKE(string, '-\d+!(\w)(,|\.)*\d*')
because somestring-1s won't be found either anymore. Could you please point me to the right direction? Thank you.
Could you please try following, written and tested based on your shown samples. Simple explanation would be: using lazy match to match till - then match digits(1 or more occurrences) followed by , and followed by 1 or more occurrences of digits.
.*?-\d+,\d+
Online regex demo for above regex
Use
(^|\D)-(\d+([,.]*\d+)?)($|\W)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\D non-digits (all but 0-9)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[,.]* any character of: ',', '.' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\W non-word characters (all but a-z, A-Z, 0-
9, _)
--------------------------------------------------------------------------------
) end of \4

RegEx for removing everything before and after a delimiter

I am trying to remove everything before and after two | delimiters using regex.
An example being:
EM|CX-001|Test Campaign Name
and grabbing everything except CX-001. I cannot use a substring as the number of characters before and after the pipes may change.
I tried using the regex (?<=\|)(.*?)(?=\-), but while this selects CX-001, I need to select everything else but this.
How do I solve this problem?
You can try the following regular expression:
(^[^|]*\|)|(\|[^|]*$)
String input = "EM|CX-001|Test Campaign Name";
System.out.println(
input.replaceAll("(^[^|]*\\|)|(\\|[^|]*$)", "")
); // prints "CX-001"
Explanation of the regular expression:
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of \2
If you have only 2 pipes in you string, you could either match upon the first pipe or match from the last one until the end of the string:
^.*?\||\|.*$
Explanation
^.*?\| Match from start of string non greedy until the first pipe
| Or
\|.*$ Match from last pipe until end of string
Regex demo
Or you might also use a negated character class [^|]* without the need of capturing groups:
^[^|]*\||\|[^|]*$
Regex demo
Note
In your pattern (?<=\|)(.*?)(?=\-) I think you meant that the last positive lookahead should be (?=\|) instead of the - if you want to select between 2 pipes.
Find: ^[^|]*\|([^|]+).+$
Replace: $1

REGEX-Match characters present in a string

What is the REGEX to accept a string like this
Starts with EDO
has many characters(words,numbers,hypehns) in between
does not contain 24 or |(pipe)
Example:
Should match
edo-<<characters>>-<<characeters>>-<<numbers>>
BUT NOT
edo-<<characters>>-<<characeters>>-<<numbers>> | <<characeters>>- <<characeters>>- <<numbers>>
The string does not have a constant length
The negative look ahead will help you to decide if the string doesnt contain 24 or |
The regex can be written as
/^edo(?!.*(24|\|))[-a-zA-Z0-9]+$/i
Regex Demo
How it matches
^ Anchors the regex at the start of the string
edo The anchor ensures that the string starts with edo
(?!.*(24|\|)) look ahead assertion. It checks if the string doesnt contain 24 or |. If it doesnt contain, then proceeds with the remaining pattern. If it contains, discards the match
[-a-zA-Z0-9]+ Matches numbers alphabets or -
$ anchors the regex at the end of the string.
^EDO(?!.*(?:(?<!\d)24(?!\d)|\|))[a-zA-Z0-9 -]+$
Try this.This should work.Use flag gmi.
See demo.
https://regex101.com/r/fA6wE2/37
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
EDO 'EDO'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
24 '24'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead
| OR
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9-]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '-' (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string