I'm trying to write a RegEx that matches all underscore characters but should not match ones in strings starting with an # character.
What I've gotten until now is a RegEx with a negative lookbeind which only ignores the first underscore in strings starting with a # character: /(?<!#)_/gi.
Playground with test data: https://regex101.com/r/Hd8IeX/1/
You can use:
^#.*(*SKIP)(*F)|_
See the online demo. You just had to use the right flags.
^ - Start string anchor.
# - Literally match "#".
.* - Match anything other than newline (greedy).
(*SKIP)(*F) - Fail any match up to that point.
| - Or:
_ - Match an underscore.
Based on the comment, could this work:
^#.*|[^\w]#.*(*SKIP)(*F)|_
Related
I'm trying to find a solution to a regex that can match anything after a string or nothing, but if there's something it can't be a dot .
is it possible to do without negative lookahead?
here's an example regex:
.*\.(cpl)[^.].*
now the string:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl sounds
this one is matched, but if there's only:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl
it's not matched because due to the dot blacklist it's searching for any character after cpl,if i use ? after the [^.] however it won't blacklist the . in case there's something else after, so it will capture this even if it shouldn't:
C:\Windows\SysWOW64\control.exe mlcfg32.cpl. sounds
can it be done without using negative lookaheads? - ?!
You may use this regex:
.*\.cpl(?:[^.].*|$)
RegEx Demo
RegEx Breakdown:
.*: Match 0 or more of any character
\.cpl: Match .cpl
(?:[^.].*|$): Match end of string or a non-dot followed by any text
You can use
.*\.(cpl)(?:[^.].*)?$
See the regex demo. Details:
.* - zero or more chars other than line break chars as many as possible
\. - a dot
(cpl) - Group 1: cpl
(?:[^.].*)? - an optional non-capturing group that matches a char other than . char and then zero or more chars other than line break chars as many as possible
$ - end of string.
Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo
I need a regex which matches all alpha numeric chars and zero or one '#' symbol in any part of the string, so:-
Ab01# - match
Ab0#1 - match
#Ab01 - match
here's what I have:-
/^[A-Za-z0-9]+#{0,1}$/
The above matches the '#' when it's at the end of the string but doesn't match when it's at the start or in the middle, for example
#Ab01 - no match
Ab#01 - no match
I've tried removing the ^ & $ indicating start and end of the expression - but this allows more than one match of the # which is not what I want.
If the # can be there only a single time, you can match optional chars from [A-Za-z0-9] and optionally match an # in between.
If you don't want to match empty strings and a negative lookahead is supported:
^(?!$)[A-Za-z0-9]*#?[A-Za-z0-9]*$
Regex demo
If there has to be at least a single char of [A-Za-z0-9] present, you could also use
^(?=#?[A-Za-z0-9])[A-Za-z0-9]*#?[A-Za-z0-9]*$
Regex demo
Alternatively maybe use:
^(?!.*#.*#)[A-Za-z\d#]+$
See the demo.
^ - Start string ancor.
(?!.*#.*#) - Negative lookahead to prevent multiple "#".
[A-Za-z\d#]+ - One or more characters from the specified character class.
$ - End string ancor.
Extract all the string between 2 patterns:
Input:
test.output0 testx.output1 output3 testds.output2(\t)
Output:
output0 output1 ouput3 output2
Note: (" ") is the tab character.
You may try:
\.\w+$
Explanation of the above regex:
\. - Matches . literally. If you do not want . to be included in your pattern; please use (?<=\.) or simply remove ..
\w+ - Matches word character [A-Za-z0-9_] 1 or more time.
$ - Represents end of the line.
You can find the demo of the regex in here.
Result Snap:
EDIT 2 by OP:
According to your latest edit; this might be helpful.
.*?\.?(\w+)(?=\t)
Explanation:
.*? - Match everything other than new line lazily.
\.? - Matches . literally zero or one time.
(\w+) - Represents a capturing group matching the word-characters one or more times.
(?=\t) - Represents a positive look-ahead matching tab.
$1 - For the replacement part $1 represents the captured group and a white-space to separate the output as desired by you. Or if you want to restore tab then use the replacement $1\t.
Please find the demo of the above regex in here.
Result Snap 2:
Try matching on the following pattern:
Find: (?<![^.\s])\w+(?!\S)
Here is an explanation of the above pattern:
(?<![^.\s]) assert that what precedes is either dot, whitespace, or the start of the input
\w+ match a word
(?!\S) assert that what follows is either whitespace of the end of the input
Demo
I'm having trouble retrieving specific information of a string.
The string is as follows:
20190502_PO_TEST.pdf
This includes the .pdf part. I need to retrieve the part between the last underscore (_) and the dot (.) leaving me with TEST
I've tried this:
[^_]+$
This however, returns:
TEST.PDF
I've also tried this:
_(.+)\.
This returns:
PO_TEST
This pattern [^_]+$ will match not an underscore until the end of the string and will also match the .
In this pattern _(.+). you have to escape the dot to match it literally like _(.+)\. see demo and then your match will be in the first capturing group.
What you also might use:
^.*_\K[^.]+
^.*_ Match the last underscore
\K Forget what was matched
[^.]+ Match 0+ times not a dot
Regex demo