RegEx for checking specific rule in string - regex

I cannot figure out how to make this RegEx syntax work.
https://regex101.com/r/Zcxjtn/1
I would like to check whether a string is valid or not.
Rules:
The string must consist of 3 capital letters [A-Z]
If the string is longer than each 3 capital letter blocks must be seperated by a semicolon (;) only
The string must not start and end with a seperator (;)
optional: whitespaces are allowed between seperator and next 3-letter sub-string
examples of valid strings:
AAA;BBB
AAA; BBB
AAA
examples of invalid strings:
;AAA
AAA;BBB;
123;AAA

The string must consist of 3 capital letters [A-Z]
[A-Z] matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
{3} matches the previous token exactly 3 times
Put together you get
[A-Z]{3}
If the string is longer than each 3 capital letter blocks must be seperated by a semicolon (;) only
Make 1. required in the beginning as ^[A-Z]{3} followed by another group that occurs 0 or more times until the end of the input ( )*$ containing a leading ; and 1. from above, so (;[A-Z]{3})*$.
Put together you get
^[A-Z]{3}(;[A-Z]{3})*$
The string must not start and end with a seperator (;)
Already covered by 2.
optional: whitespaces are allowed between seperator and next 3-letter sub-string
Add a white-space \s that occurs 0 or more times *, so \s*.
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
Put to the correct location in the regex you get
^[A-Z]{3}(;\s*[A-Z]{3})*$
See: https://regex101.com/r/hZ5l6a/1
If you would like to capture only letters, add capture groups ( ) and mark some groups as non-capturing groups (?: )
Example:
^([A-Z]{3})(:?;\s*([A-Z]{3}))*$
See: https://regex101.com/r/NlDTfq/1

Related

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regular expression to allow spaces between words (without special characters) but removing spaces at the beginning

I'm working on validating a field in which you can enter
letter (0-9)
characters (a-Z),
spaces,
'-' and
'_'
but you cannot enter special characters (!##$%)
You can also add more space characters (at the end and beginning)
but after the first spacing must be at least one allowed character
Good:
" some 123 exa_mple m-s-g "
"123abc"
" a"
Bad:
" "
"123!##abc"
You could use a negative lookahead to assert that special characters do not appear anywhere in the input:
^(?!.*[!##$%])\s*[A-Za-z0-9_-][A-Za-z0-9 _-]*$
Demo
Here is an explanation of the pattern used:
^ from the start of the string
(?!.*[!##$%]) assert that no symbols occur anywhere
\s* match optional leading whitespace
[A-Za-z0-9_-] match one allowed character (non space)
[A-Za-z0-9 _-]* then match zero or more allowed characters (including space)
$ end of the string
I' use:
^(?!\h+$)[\w\h-]+$
Explanation:
^ # beginning of string
(?!\h+$) # negative lookahead, make sure we haven't only spaces
[\w\h-]+ # character class, alphanumeric, underscore, space, hyphen
$ # end of string
Demo & explanation

Regex to match if a word starts and end with a letter, have no more than one consecutive non-letter (. *')

I'm currently trying to find a regex to match a specific use case and I'm not finding any specific way to achieve it. I would like, as the title says, to match if a word starts and end with a letter, contains only letter and those characters: "\ *- \'" . It should also have no more than one consecutive non-letter.
I currently have this, but it accepts consecutive non-letter and doesn't accept single letters [a-zA-Z][a-zA-Z \-*']+[a-zA-Z]
I want my regex to accept this string
This is accepted since it contains only spaces and letter and there is no consecutive space
a should be accepted
This is --- not accepted because it contains 5 consecutive non-letters characters (3 dashes and 2 spaces)
" This is not accepted because it starts with a space"
Neither is this one since it ends with a dash -
You may use
^[a-zA-Z]+(?:[ *'-][a-zA-Z]+)*$
See the regex demo and the regex graph:
Details
^ - start of string anchor
[a-zA-Z]+ - 1+ ASCII letters
(?:[ *'-][a-zA-Z]+)* - 0 or more sequences of:
[ *'-] - a space, *, ' or -
[a-zA-Z]+ - 1+ ASCII letters
$ - end of string anchor.

Regex for name type

I am working on regex with the following conditions:
Must contain from 1 to 63 alphanumeric characters or hyphens.
First character must be a letter.
Cannot end with a hyphen or contain two consecutive hyphens.
I am able to get the regex like:
^[a-zA-Z0-9](?!.*--)[a-zA-Z0-9-]{0,61}[A-Za-z0-9]$
But it fails on the length constraint as well as allows patterns like "a-". How can I meet the conditions?
I would phrase your requirements as:
^(?=.{1,63}$)(?!.*--)[a-zA-Z]([a-zA-Z0-9\-]*[a-zA-Z0-9])?$
Demo
Here is a brief explanation of what each part of the above regex does:
^ from the start of the match
(?=.{1,63}$) assert that the string is between 1 63 characters
(?!.*--) assert that two hyphens do not appear together anywhere
[a-zA-Z] first character is a letter (mandatory in all matches)
([a-zA-Z0-9\-]*[a-zA-Z0-9])?
The final portion says to match a final character which is alphanumeric, but not dash, possibly preceded by alphanumeric characters or dash.
My take on this would be:
^[A-Za-z](?!.*?--)[A-Za-z0-9\-]{0,62}(?<!-)$
Try it out here
Explanation:
^ - Matches the start of the string.
[A-Za-z] - Matches the first letter.
(?!.*?--) - Ensures that there are no two consecutive hyphens in the rest of the string.
[A-Za-z0-9\-]{0,62} - Matches the remaining alphanumeric and hyphen characters.
(?<!-) - Ensures that the string doesn't end with a hyphen.
$ - Matches the end of the string.

What could be the Regular Expression for the following

$ cat t1.txt:
ABCD_EFG_HIJK
ABCD_HJIJ_IJKL
What could be the Regex for the above two lines .
Even for one of the lines
Or
Scenario is 4characters followedby underscore followed by characters ( any number) followed by underscore followed by characters (any number) again underscore characters .. ends with characters.
4characters_(minimum of 1 characters)_(minimum of1 characters)_(ends with minimum of 1 characters).
Note : It starts with 4 characters.
After edit, the question is to find a regex that matches a string that starts with 4 chars, followed by minimum of 1 group which consists of '_' followed by minimal 1 character.
[A-Z]{4}(_[A-Z]+)+
explanation:
[A-Z]{4} # exactly 4 picks from A-Z
( # group 1 start
_[A-Z]+ # "_" followed by 1 or more character out of A-Z
)+ # group 1 end. Repeat group 1 1 or more times.
You can play with it at regex101
In the above regex I've chosen for capitals as characters, since this is suggested by the question. However, this could be a set of letters e.g., which would change the regex to:
[a-zA-Z]{4}(_[a-zA-Z]+)+
If you mean by any number of character at least one character, this is the most correct answer: /^[A-Za-z0-9]{4}_([A-Za-z0-9]+_)+[A-Za-z0-9]+$/g.
If you want, you can try this solution at regex website: regexr.com
EDIT: If you want to have only capital letters, than you should remove a-z and 0-9 from square brackets.
Another option:
[^_\n]+_[^_]+_[^_\n]+
Match everything except new line \n and _
between underscores