Regex Pattern - Groovy - regex

I need to create a regex - with the following requirements
starts with C, D, F, G, I, M or P
has at least one underscore (_)
eg. C6352_3
I've tried the following like this
#Pattern(regexp = '^(\C|\D|\F|\G|\I\|\M|\P)+\_*' , message = "error")

You may use
/^[CDFGIMP][^_\s]*_\S*$/
Or, to only handle word chars (letters, digits and _),
/^[CDFGIMP]\w*_\w*$/
or a bit more efficient one with character class subtraction:
/^[CDFGIMP][\w&&[^_]]*_\w*$/
See the regex demo
Details
^ - start of a string
[CDFGIMP] - any char listed in the character set
[^_\s]* - zero or more chars other than _ and whitespace
\w* - matches 0+ word chars: letters, digits or _ ([\w&&[^_]]* matches 0+ letters and digits only)
_ - an underscore
\S* - 0+ non-whitespace chars (or \w* will match any letters, digits or _)
$ - end of string (or better, \z to only match at the very end of the string).

You could skip regex, and make it readable:
boolean valid(String value) {
(value?.take(1) in ['C', 'D', 'F', 'G', 'I', 'M', 'P']) && value?.contains('_')
}

Related

Find if either followed by non number or end of file

I want to match the string b5 with optional $ in front of the b or tha 5 :
=b5
b$5
= $b$5
($b5)
But the 5 can't be followed by any number . And the b can't be preceded by any alphabet. So this should return false :
b55
ab5
I tried this :
\W\$*b\$*5\W
it works fine. i will match X=($b$5) but the problem is : it won't match anymore if the '5' is the last character in the line.
because 5 is last character
You can use
(?:\W|^)\$*b\$*5(?:\W|$)
(?:\W|^)\$*b\$*5\b
See the RE2 regex demo.
Details
(?:\W|^) - a non-capturing group matching either a non-word char or start of string
\$* - zero or more $ chars
b - a b char
\$* - zero or more $ chars
5 - a 5 char
(?:\W|$) - a non-capturing group matching either a non-word char or end of string or
\b - a word boundary.

Regex match strings with different values

for i,v in array
for i , v in array
for i , v in array
for i, v in array
for i,v in array
for i, v in array
for[\s+,.](.+)
https://regex101.com/r/Vd3w7C/2
How i could match anything after the v
but
i,v, and in array will have different values
i mean something like:
for ppp,gflgkf heekd gfvb
You could use
\bfor\s+[^\s,]+(?:\s*,\s*[^\s,]+)*\s+(.+)
The pattern matches:
\bfor\s+ Match for and 1+ whitespace chars
[^\s,]+ Match 1+ times any char except a whitspace char or ,
(?: Non capture group
\s*,\s*[^\s,]+ Match a comma between optional whitespace chars, and match at least a single char other than a comma or whitespace chars
)*\s+ Close the group and optionally repeat it followed by 1+ whitespace chars
(.+) Capture 1+ times any char except a newline in group 1
See a regex demo.

Regex 'OR' seems to not behave as expected

Hello I am trying to build a regex for a string with the followings constraints :
should only contain 'X', 'O', 'T', '_', ';'
'T' and 'O' should occur only once and can be anywhere in the string
'X', '_', ';' may occur zero to n times
Here are few valid examples :
"X__;O_T;___"
"T__;_XX_;_XO"
"T__;OX_;_X_"
"OT"
This is the regex I have right now :
/^([X;_]*T[X;_]*O)|([X;_]*O[X;_]*T);$ */i
The above seems to pass the below input as valid:
T__;_X__OO; //which is not valid
Thanks for your time.
If you can use a lookahead you may use
^(?=[^O]*O[^O]*$)(?=[^T]*T[^T]*$)[TOX;_]*$
See the regex demo
Details
^ - start of string
(?=[^O]*O[^O]*$) - there must be any 0+ chars other than O, then O, and then any 0+ chars other than O up to the end of the string
(?=[^T]*T[^T]*$) - there must be any 0+ chars other than T, then T, and then any 0+ chars other than T up to the end of the string
[TOX;_]* - 0+ T, O, X, ;, _ chars
$ - end of string.
A non-lookaround approach based on alternation is also possible:
^[X;_]*(?:T[X;_]*O|O[X;_]*T)[X;_]*$
See the regex demo.
Details
^ - string start
[X;_]* - 0+ T, O, X, ;, _ chars
(?:T[X;_]*O|O[X;_]*T) - either of the two alternatives:
T[X;_]*O - T, any 0+ T, O, X, ;, _ chars, O
| - or
O[X;_]*T - O, any 0+ T, O, X, ;, _ chars, T
[X;_]* - 0+ T, O, X, ;, _ chars
$ - string end.

Using regexp to find a reoccurring pattern in MATLAB

input = ' 12Z taj 20501 jfdjda OCNL jtjajd ptpa 23Z jfdakdkf tjajdfk OCNL fdkadja 02Z fdjafsdk fkdsafk OCNL fdkafk dksakj = '
using regexp
regexp(input,'\s\d{2,4}Z\s.*(OCNL)','match')
I'm trying to get the output
[1,1] = 12Z taj 20501 jfdjda OCNL jtjajd ptpa
[1,2] = 23Z jfdakdkf tjajdfk OCNL fdkadja
[1,3] = 02Z fdjafsdk fkdsafk OCNL fdkafk dksakj
You may use
(?<!\S)\d{2,4}Z\s+.*?\S(?=\s\d{2,4}Z\s|\s*=\s*$)
See the regex demo.
Details
(?<!\S) - there must be a whitespace or start of string immediately to the left of the current location
\d{2,4} - 2, 3 or 4 digits
Z - a Z letter
\s+ - 1+ whitespaces
.*?\S - any zero or more chars as few as possible and then a non-whitespace
(?=\s\d{2,4}Z\s|\s*=\s*$) - there must be either of the two patterns immediately to the right of the current location:
\s\d{2,4}Z\s - a whitespace, 2, 3 or 4 digits, Z and a whitespace
| - or
\s*=\s*$ - a = enclosed with 0+ whitespace chars at the end of the string.

Ignore specific characters in regex

I have this method to check if a string contains a special character, but I don't want it to check for specific characters such as (+ or -) how would I go about doing this?
public boolean containsSpecialCharacters(String teamName) {
Pattern p = Pattern.compile("[^a-z0-9 ]", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(teamName);
boolean b = m.find();
if (b) {
return true;
}
return false;
}
You can try this:
[^\w +-]
REGEX EXPLANATION
[^\w +-]
Match a single character NOT present in the list below «[^\w +-]»
A word character (letters, digits, and underscores) «\w»
The character “ ” « »
The character “+” «+»
The character “-” «-»
You can use the following. Simply add these characters inside of your negated character class.
Within a character class [], you can place a hyphen (-) as the first or last character. If you place the hyphen anywhere else you need to escape it (\-) in order to be matched.
Pattern p = Pattern.compile("(?i)[^a-z0-9 +-]");
Regular expression:
(?i) # set flags for this block (case-insensitive)
[^a-z0-9+-] # any character except: 'a' to 'z', '0' to '9', ' ', '+', '-'