Regex capture string up to character or end of line

Regex capture string up to character or end of line - regex

I need one regex to capture a string up to a :, but the problem is that the : is not always there.
At this moment I am able to capture the groups when I have the : but not when I dont.
Not sure what I am doing wrong.
strings to capture
XXX 1 A:B (working)
XXX 1 A: (working)
XXX A (not working)
My regex:
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*)(?=\:)(?:.)*$

You can use
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*?)(?::.*)?$
See the regex demo. Details:
^ - start of string
(?P<grp1>[A-Z]{3,10}) - Group "grp1": three to ten uppercase letters
\s - a whitespace
(?P<grp2>.*?) - Group "grp2": any zero or more chars other than line break chars, as few as possible
(?::.*)? - an optional group matching any zero or more chars other than line break chars as many as possible
$- end of string.

Optionally match a single : after it
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>[^:\r\n]*)(?::[^:\r\n]*)?$
^ Start of string
(?P<grp1>[A-Z]{3,10}) Group grp1
\s Match a whitspace char
(?P<grp2>[^:\r\n]*) Group 2 grp2 Match any char except : or a newline
(?::[^:\r\n]*)? Optionally match a single : between optional chars other than : or a newline
$ End of string
Regex demo

Related

regular expression with If condition question

I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?

Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.

Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$

Regex to match the letter string group between 2 numbers

Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.

You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);

You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.

To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)

You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo

Regex to extract first word after last colon without space and with full match

I have to do Full Match for a word which comes after the last colon and between spaces. e.g. In below Sentence
XYZ Cloud : ABC : Windows : Non Prod : Silver : ABC123XYZ : ABCdef Service is Down
Here I have to do full match for ABCdef. ([^:.*\s]+$) returns Down, ([^:]+$) returns ' ABCdef Service is Down' as full match. However I am looking for ABCdef as full match.

You can match until the last occurrence of :, then match a spaces and capture 1+ non whitespace chars in group 1.
^.*: +(\S+)
Explanation
^ Start of string
.*: + Match any char except a newline as much as possible followed by 1 or more spaces
(\S+) capture group 1, match 1+ times a non whitespace char followed by a space
Regex demo
For a match only, you might use a positive lookarounds:
(?<=: )[^\s:]+(?=[^\r\n:]*$)
Explanation
(?<=: ) Positive lookbehind, assert what is directly to the left is `:
[^\s:]+ Match 1+ times any char except a whitespace char or :
(?=[^\r\n:]*$) Positive lookahead, assert what is at the right it 0+ times any char except a newline or : and assert the end of the string.
Regex demo

Using C#, this works:
var text = #"XYZ Cloud : ABC : Windows : Non Prod : Silver : ABC123XYZ : ABCdef Service is Down";
var regex = new Regex(#":(?!.*:)\s(.+?)\s.*$");
Console.WriteLine(regex.Match(text).Groups[1].Value);
I get:
ABCdef

I think the simplest [ ]*([^: ]*)[^:]*$
[ ]*: any amount of spaces
([^: ]*): a group with the target non-space word with no colons
[^:]*$: the rest of the line without colons
https://regex101.com/r/LMQrb7/1
To access the word you're looking for, use group 1.

Regex which matches all alpha numeric chars and zero or one '#' symbol

I need a regex which matches all alpha numeric chars and zero or one '#' symbol in any part of the string, so:-
Ab01# - match
Ab0#1 - match
#Ab01 - match
here's what I have:-
/^[A-Za-z0-9]+#{0,1}$/
The above matches the '#' when it's at the end of the string but doesn't match when it's at the start or in the middle, for example
#Ab01 - no match
Ab#01 - no match
I've tried removing the ^ & $ indicating start and end of the expression - but this allows more than one match of the # which is not what I want.

If the # can be there only a single time, you can match optional chars from [A-Za-z0-9] and optionally match an # in between.
If you don't want to match empty strings and a negative lookahead is supported:
^(?!$)[A-Za-z0-9]*#?[A-Za-z0-9]*$
Regex demo
If there has to be at least a single char of [A-Za-z0-9] present, you could also use
^(?=#?[A-Za-z0-9])[A-Za-z0-9]*#?[A-Za-z0-9]*$
Regex demo

Alternatively maybe use:
^(?!.*#.*#)[A-Za-z\d#]+$
See the demo.
^ - Start string ancor.
(?!.*#.*#) - Negative lookahead to prevent multiple "#".
[A-Za-z\d#]+ - One or more characters from the specified character class.
$ - End string ancor.

Regex: remove all except first character and last number

I know that ^. is first character and (\d+)(?!.*\d) is last number. I've tried using | between these and have been trying to find code for the second character, but with no success.
This is in R.
Take for example:
'ABCD some random words and spaces 1234' should output 'A4' when I do
sub([regex here], "", 'ABCD some random words and spaces 1234')

If you used ^.|(\d+)(?!.*\d), the pattern would only match the first char and remove it with sub, and would remove the first char and the last 1+ digits if used with gsub without backreferences in the replacement pattern. See this pattern demo.
You can use
sub("^(.).*(\\d).*$", "\\1\\2", "ABCD some random words and spaces 1234")
See the R demo and the regex demo.
This TRE regex pattern matches:
^ - start of string
(.) - Group 1 capturing any char
.* - 0+ any chars as many as possible up to the last...
(\\d) - Group 2 capturing a digit
.* - the rest of the string
$ - end of string.
The \\1\\2 replacement pattern re-inserts the values captured with Group 1 and Group 2 back to the result.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex capture string up to character or end of line - regex

Related

regular expression with If condition question

Regex to match the letter string group between 2 numbers

Regex to extract first word after last colon without space and with full match

Regex which matches all alpha numeric chars and zero or one '#' symbol

Regex: remove all except first character and last number

Categories

Resources