Regex 'OR' seems to not behave as expected - regex

Hello I am trying to build a regex for a string with the followings constraints :
should only contain 'X', 'O', 'T', '_', ';'
'T' and 'O' should occur only once and can be anywhere in the string
'X', '_', ';' may occur zero to n times
Here are few valid examples :
"X__;O_T;___"
"T__;_XX_;_XO"
"T__;OX_;_X_"
"OT"
This is the regex I have right now :
/^([X;_]*T[X;_]*O)|([X;_]*O[X;_]*T);$ */i
The above seems to pass the below input as valid:
T__;_X__OO; //which is not valid
Thanks for your time.

If you can use a lookahead you may use
^(?=[^O]*O[^O]*$)(?=[^T]*T[^T]*$)[TOX;_]*$
See the regex demo
Details
^ - start of string
(?=[^O]*O[^O]*$) - there must be any 0+ chars other than O, then O, and then any 0+ chars other than O up to the end of the string
(?=[^T]*T[^T]*$) - there must be any 0+ chars other than T, then T, and then any 0+ chars other than T up to the end of the string
[TOX;_]* - 0+ T, O, X, ;, _ chars
$ - end of string.
A non-lookaround approach based on alternation is also possible:
^[X;_]*(?:T[X;_]*O|O[X;_]*T)[X;_]*$
See the regex demo.
Details
^ - string start
[X;_]* - 0+ T, O, X, ;, _ chars
(?:T[X;_]*O|O[X;_]*T) - either of the two alternatives:
T[X;_]*O - T, any 0+ T, O, X, ;, _ chars, O
| - or
O[X;_]*T - O, any 0+ T, O, X, ;, _ chars, T
[X;_]* - 0+ T, O, X, ;, _ chars
$ - string end.

Related

Regex capture required and optional characters in any position only

I would like to match against a word only a set of characters in any order but one of those letters is required.
Example:
Optional letters: yujkfec
Required letter: d
Matches: duck dey feed yudekk dude jude dedededy jejeyyyjd
No matches (do not contain required): yuck feck
No matches (contain letters outside of set): sucked shock blah food bard
I've tried ^[d]+[yujkfec]*$ but this only matches when the required letter is in the front. I've tried positive lookaheads but this didn't do much.
You can use
\b[yujkfec]*d[dyujkfec]*\b
See the regex demo. Note that the d is included into the second character class.
Details:
\b - word boundary
[yujkfec]* - zero or more occurrences of y, u, j, k, f, e or c
d - a d char
[dyujkfec]* - zero or more occurrences of y, u, j, k, f, e, c or d.
\b - a word boundary.

Find if either followed by non number or end of file

I want to match the string b5 with optional $ in front of the b or tha 5 :
=b5
b$5
= $b$5
($b5)
But the 5 can't be followed by any number . And the b can't be preceded by any alphabet. So this should return false :
b55
ab5
I tried this :
\W\$*b\$*5\W
it works fine. i will match X=($b$5) but the problem is : it won't match anymore if the '5' is the last character in the line.
because 5 is last character
You can use
(?:\W|^)\$*b\$*5(?:\W|$)
(?:\W|^)\$*b\$*5\b
See the RE2 regex demo.
Details
(?:\W|^) - a non-capturing group matching either a non-word char or start of string
\$* - zero or more $ chars
b - a b char
\$* - zero or more $ chars
5 - a 5 char
(?:\W|$) - a non-capturing group matching either a non-word char or end of string or
\b - a word boundary.

Regex match strings with different values

for i,v in array
for i , v in array
for i , v in array
for i, v in array
for i,v in array
for i, v in array
for[\s+,.](.+)
https://regex101.com/r/Vd3w7C/2
How i could match anything after the v
but
i,v, and in array will have different values
i mean something like:
for ppp,gflgkf heekd gfvb
You could use
\bfor\s+[^\s,]+(?:\s*,\s*[^\s,]+)*\s+(.+)
The pattern matches:
\bfor\s+ Match for and 1+ whitespace chars
[^\s,]+ Match 1+ times any char except a whitspace char or ,
(?: Non capture group
\s*,\s*[^\s,]+ Match a comma between optional whitespace chars, and match at least a single char other than a comma or whitespace chars
)*\s+ Close the group and optionally repeat it followed by 1+ whitespace chars
(.+) Capture 1+ times any char except a newline in group 1
See a regex demo.

command line grep finding words with exactly one vowel

how do you list all the lines that contain words which contain one vowel?
I have tried
egrep -i '\<.*[aeiou]{1}.*\>' f3.txt
but I'm stuck and can't figure it out
You may use
grep -i '\<[^[:digit:][:punct:][:space:]aeiou]*[aeiou][^[:digit:][:punct:][:space:]aeiou]*\>' f3.txt
Details
\< - start of a word
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
[aeiou] - 1 occurrence of a, e, i, o or u
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
\> - end of a word.
See an online demo.

Regex Pattern - Groovy

I need to create a regex - with the following requirements
starts with C, D, F, G, I, M or P
has at least one underscore (_)
eg. C6352_3
I've tried the following like this
#Pattern(regexp = '^(\C|\D|\F|\G|\I\|\M|\P)+\_*' , message = "error")
You may use
/^[CDFGIMP][^_\s]*_\S*$/
Or, to only handle word chars (letters, digits and _),
/^[CDFGIMP]\w*_\w*$/
or a bit more efficient one with character class subtraction:
/^[CDFGIMP][\w&&[^_]]*_\w*$/
See the regex demo
Details
^ - start of a string
[CDFGIMP] - any char listed in the character set
[^_\s]* - zero or more chars other than _ and whitespace
\w* - matches 0+ word chars: letters, digits or _ ([\w&&[^_]]* matches 0+ letters and digits only)
_ - an underscore
\S* - 0+ non-whitespace chars (or \w* will match any letters, digits or _)
$ - end of string (or better, \z to only match at the very end of the string).
You could skip regex, and make it readable:
boolean valid(String value) {
(value?.take(1) in ['C', 'D', 'F', 'G', 'I', 'M', 'P']) && value?.contains('_')
}