What could be the Regular Expression for the following - regex

$ cat t1.txt:
ABCD_EFG_HIJK
ABCD_HJIJ_IJKL
What could be the Regex for the above two lines .
Even for one of the lines
Or
Scenario is 4characters followedby underscore followed by characters ( any number) followed by underscore followed by characters (any number) again underscore characters .. ends with characters.
4characters_(minimum of 1 characters)_(minimum of1 characters)_(ends with minimum of 1 characters).
Note : It starts with 4 characters.

After edit, the question is to find a regex that matches a string that starts with 4 chars, followed by minimum of 1 group which consists of '_' followed by minimal 1 character.
[A-Z]{4}(_[A-Z]+)+
explanation:
[A-Z]{4} # exactly 4 picks from A-Z
( # group 1 start
_[A-Z]+ # "_" followed by 1 or more character out of A-Z
)+ # group 1 end. Repeat group 1 1 or more times.
You can play with it at regex101
In the above regex I've chosen for capitals as characters, since this is suggested by the question. However, this could be a set of letters e.g., which would change the regex to:
[a-zA-Z]{4}(_[a-zA-Z]+)+

If you mean by any number of character at least one character, this is the most correct answer: /^[A-Za-z0-9]{4}_([A-Za-z0-9]+_)+[A-Za-z0-9]+$/g.
If you want, you can try this solution at regex website: regexr.com
EDIT: If you want to have only capital letters, than you should remove a-z and 0-9 from square brackets.

Another option:
[^_\n]+_[^_]+_[^_\n]+
Match everything except new line \n and _
between underscores

Related

Regex to match only the following strings

I have a few strings and I need some help with constructing Regex to match them.
The example strings are:
AAPL10.XX1.XX2
AAA34CL
AAXL23.XLF2
AAPL
I have tried few expressions but couldn't achieve exact results. They are of the following:
[0-9A-Z]+\.?[0-9A-Z]$
[A-Z0-9]*\.?[^.]$
Following are some of the points which should be maintained:
The pattern should only contain capital letters and digits and no small letters are allowed.
The '.' in the middle of the text is optional. And the maximum number of times it can appear is only 2.
It should not have any special characters at the end.
Please ask me for any clarification.
You can write the pattern as:
^[A-Z\d]+(?:\.[A-Z\d]+){0,2}$
The pattern matches:
^ Start of string
[A-Z\d]+ Match 1+ chars A-Z or a digit
(?:\.[A-Z\d]+){0,2} Repeat 0 - 2 times a . and 1+ chars A-Z or a digit
$ End of string
Regex demo

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regex to require space after comma in list

I want to require a space after every comma in a list. I've got this, which works pretty well for my lists that have 5 to 7 digits, separated by commas.
^([^,]{5,7},)*[^,][^ ]{5,7}$
The problem is it allows 12345,12345. I don't want that to pass. 12345, 12345 should pass. I also need just 12345 to pass, so the comma and space is not required if it's just one 5-7 digit number.
Your regex does not match 12345,12345 because this part ([^,]{5,7},)* will match from the start including the comma.
Then it matches not a comma [^,] which will match the second 1 and then it has to match not a whitespace [^ ]{5,7} but there are only 4 characters left to match which are 2345 and it can not match.
If the first part fails it tries to match [^,][^ ]{5,7} which in total matches 6-8 characters.
You might use:
^[^,\s]{5,7}(?:, [^,\s]{5,7})*$
Regex demo
^ Start of the string
[^,\s]{5,7} Match not a whitespace character of a comma 5 - 7 times
(?: Non capturing group
, [^,\s]{5,7} Match a comma, space and not a comma or a whitespace character 5-7 times
)* Close non capturing group and repeat 0+ times
$ End of the string
I didn't understand your regex, but something as simple as this should work:
^(?:\d{5,7}, )*\d{5,7}$
Or if you didn't intend to allow digit-only,
^(?:[^, ]{5,7}, )*[^, ]{5,7}$

Regular Expression, with number spaces dashes limited to 8-13 numbers

I am trying to do a regular expression to validate a number between 9 and 13 numbers, but the sequence can have dashes and spaces and the ideal is to not have more than one space or dash consecutively.
this rule allow me to control the validation between 9 and 13
/^[\d]{9,13}$/
now to add dashes and spaces
/^[\d -]{9,13}$/
I think I need something like that, but I need to count the numbers
/^[ -](?:\d){9,13}$/
Any tips?
Notice how my regex starts and ends with a digit. Also, this prevents consecutive spaces and dashes.
/^\d([ \-]?\d){7,12}$/
It appears that you don't want leading or trailing spaces and dashes. This should do it.
/^\d([- ]*\d){8,12}$/
Regular expression:
\d digits (0-9)
( group and capture to \1 (between 8 and 12 times)
[- ]* any character of: '-', ' ' (0 or more times)
\d digits (0-9)
){8,12} end of \1
Another option: A digit followed any number of space or dash 8-12 times, followed by a digit.
/^(\d[- ]*){8,12}\d$/
Use look aheads to assert the various constraints:
/^(?!.*( |--))(?=(\D*\d){9,13}\D*$)[\d -]+$/
Assuming a dash following a space or vice versa is ok:
^( -?|- ?)?(\d( -?|- ?)?){9,13}$
Explanation:
( -?|- ?) - this is equivalent to ( | -|-|- ). Note that there can't be 2 consecutive dashes or spaces here, and this can only appear at the start or directly after a digit, so this prevents 2 consecutive dashes or spaces in the string.
And there clearly must be exactly one digit in (\d( -?|- ?)?), thus the {9,13} enforces 9-13 digits.
Assuming a dash following a space or vice versa is NOT ok:
^[ -]?(\d[ -]?){9,13}$
Explanation similar to the above.
Both of the above allows the string to start or end with a digit, dash or space.

Regular Expression to match strings

I want to match all the strings satifying following rules-
should consist of lower-case letters and digits and dashes
should start with a letter or a number
should end with a letter or number
total string length should be atleast 3 and atmost 20 characters
dot . is optional, there shouldn't be two or more consecutive dots .
dash - is optional, there shouldn't be two or more consecutive dashes -
dot . and dash - shouldn't be consecutive // the string aaa.-aaabbb is invalid
underscore not allowed
I have come up with this regex:
^[a-z0-9]([a-z0-9]+\.?\-?[a-z0-9]+){1,18}[a-z0-9]$
[a-z0-9] //should start/end with a letter or a number
([a-z0-9]+\.?\-?[a-z0-9]+){1,18} //other rules
However it is failing in some scenarios like -
abcdefghijklmnopqrstuvwxyz //should fail total number of chars greater than 20
aaa.-aaabbb //should fail as dot '.' and dash '-' are consecutive
Can anyone please help me in correcting this regex?
You can achieve this with a lookahead assertion:
^(?!.*[.-]{2})[a-z0-9][a-z0-9.-]{1,18}[a-z0-9]$
Explanation:
^ # Start of string
(?! # Assert that the following can't be matched:
.* # Any number of characters
[.-]{2} # followed by .. or -- or .- or -.
) # End of lookahead
[a-z0-9] # Match lowercase letter/digit
[a-z0-9.-]{1,18} # Match 1-18 of the allowed characters
[a-z0-9] # Match lowercase letter/digit
$ # End of string
I came up with this which uses a negative lookahead similar to Tim's solution but a different way of appying it. Because it only does the look ahead when it sees a dot or a dash it may not need to do quite so much back tracking which may make it perform very slightly faster.
^[a-z0-9]([a-z0-9]|([-.](?![.-]))){1,18}[a-z0-9]$
Explanation:
^ # Start of string
[a-z0-9] # Must start with a letter or number
( # Begin Group
[a-z0-9] # Match a letter or number
| # OR
([-.](?![.-])) # Match a dot or dash that is not followed by a dot or dash
){1,18} # Match group 1 to 18 times
[a-z0-9] # Must end with a letter or number
$ # End of string