Regular Expression to match strings

Regular Expression to match strings - regex

I want to match all the strings satifying following rules-
should consist of lower-case letters and digits and dashes
should start with a letter or a number
should end with a letter or number
total string length should be atleast 3 and atmost 20 characters
dot . is optional, there shouldn't be two or more consecutive dots .
dash - is optional, there shouldn't be two or more consecutive dashes -
dot . and dash - shouldn't be consecutive // the string aaa.-aaabbb is invalid
underscore not allowed
I have come up with this regex:
^[a-z0-9]([a-z0-9]+\.?\-?[a-z0-9]+){1,18}[a-z0-9]$
[a-z0-9] //should start/end with a letter or a number
([a-z0-9]+\.?\-?[a-z0-9]+){1,18} //other rules
However it is failing in some scenarios like -
abcdefghijklmnopqrstuvwxyz //should fail total number of chars greater than 20
aaa.-aaabbb //should fail as dot '.' and dash '-' are consecutive
Can anyone please help me in correcting this regex?

You can achieve this with a lookahead assertion:
^(?!.*[.-]{2})[a-z0-9][a-z0-9.-]{1,18}[a-z0-9]$
Explanation:
^ # Start of string
(?! # Assert that the following can't be matched:
.* # Any number of characters
[.-]{2} # followed by .. or -- or .- or -.
) # End of lookahead
[a-z0-9] # Match lowercase letter/digit
[a-z0-9.-]{1,18} # Match 1-18 of the allowed characters
[a-z0-9] # Match lowercase letter/digit
$ # End of string

I came up with this which uses a negative lookahead similar to Tim's solution but a different way of appying it. Because it only does the look ahead when it sees a dot or a dash it may not need to do quite so much back tracking which may make it perform very slightly faster.
^[a-z0-9]([a-z0-9]|([-.](?![.-]))){1,18}[a-z0-9]$
Explanation:
^ # Start of string
[a-z0-9] # Must start with a letter or number
( # Begin Group
[a-z0-9] # Match a letter or number
| # OR
([-.](?![.-])) # Match a dot or dash that is not followed by a dot or dash
){1,18} # Match group 1 to 18 times
[a-z0-9] # Must end with a letter or number
$ # End of string

Related

Regex get string after specific char, but only when the text starts with a specific string

I have a list of values that contains various values, but I'm only interested in the number after # of those starting with XXX_
ABC
XXX_YYY
XXX_YYY#12235
XXX_YYY#12281
XXX_YYY#12318
I have tried several things but not quite hit the head of the nail :-(
(?<!XXX\_)#
and
(?<=XXX\_)\*\[^#\]+$ - closest but also get those without # in :-(

To get the number after #, please find below python code and modify as per need
import re
result = re.findall("(?<=#)(.*?)(?=$)",a)
print(result[0])

Both patterns do not take numbers into account, and will match:
(?<!XXX_)# only matches a single # when not directly preceded by XXX_
(?<=XXX_)*[^#]+$ Optionally repeats a lookbehind assertion, and then matches 1+ chars other than # till the end of the string.
If there is a single # char in the string before the numbers, you can match XXX_ followed by any char except # using a negated character class and then match # followed by capturing the digits at the end of the string in group 1.
XXX_[^\n#]*#(\d+)$
The pattern matches:
XXX_ Match literally
[^\n#]*# Match optional chars other than # or a newline, then match #
(\d+) Capture 1+ digits in group 1
$ End of string
See a regex demo.

Modify this yup validation to change max length to 9 if the string does not include a dash

I'm trying to write a yup validator that validates a field's max length, depending on whether a dash is included in the string. If a dash is included, the max length is 10, if there is no dash, the max length should be 9.
For example:
'string-111' should have a max length of 10.
'string111' should have a max length of 9.
My current code looks like:
import * as Yup from 'yup';
export default Yup.object().shape({
description: Yup.string()
.matches(
/^[a-zA-Z0-9-]*$/,
'Invoice # can only contain letters, numbers and dashes'
)
.max(10, 'Invoice # has a max length of 10 characters'),
});
I see the yup documentation https://github.com/jquense/yup has a .when() method, but it seems to be used in very specific cases in their examples. Here, the user can place the dash anywhere in the string.
Any ideas on how to rewrite this validator, so that when there is no dash in the string, the maxlength should be 9?

You could match either match 10 chars where a hyphen can occur at any place using a positive lookahad, or match 9 chars consisting only of a-z0-9.
^(?:(?=[a-z0-9-]{10}$)[a-z0-9]*-[a-z0-9]*|[a-z0-9]{9})$
Explanation
^ Start of string
(?: Non capture group
(?= Positive lookahead, assert what is on the right is
[a-z0-9-]{10}$ Match 10 times either a-z0-9 or - till the end of the string
) Close lookahead
[a-z0-9]*-[a-z0-9]* Match a hyphen between chars a-z0-9
| Or
[a-z0-9]{9} Match 9 chars a-z0-9
) Close group
$ End of string
Regex demo

I worked up a solution I liked but found it had already been posted by #Thefourthbird, so I tried a different tack and came up with this:
/^(?=(?:-*[^-]-*){9}$)(?=(?:[^-]*-[^-]*){0,1}$).*/gm
You can see that this regex contains two positive lookaheads, both beginning at the start of a line. The first ensures that the string contains 9 non-hyphens; the second requires that there be at most one hyphen.
demo
The demo provides a detailed and thorough explanation of how this regex works, but we can also make it self-documenting by writing it in free-spacing mode:
/
^ # match beginning of string
(?= # begin a positive lookahead
(?:-*[^-]-*){9} # match 9 strings, each with one char that is
# not a hyphen, possibly preceded and/or
# followed by hyphens
$ # match the end of a line
) # end positive lookahead
(?= # begin a positive lookahead
(?:[^-]*-[^-]*){0,1} # match 0 or 1 strings, each containing one hyphen,
# possibly preceded and/or followed by non-hyphens
$ # match the end of the string
) # end positive lookahead
.* # match 0+ characters (the entire string)
/gmx # global, multiline and free-spacing regex
# definition modes
If desired, [^-] could replaced with [a-zA-Z0-9], \p{Alnum} or something else, depending on requirements.

Regex code , Python-2 alphanumeric [duplicate]

My regex knowledge is pretty limited, but I'm trying to write/find an expression that will capture the following string types in a document:
DO match:
ADY123
AD12ADY
1HGER_2
145-DE-FR2
Bicycle1
2Bicycle
128D
128878P
DON'T match:
BICYCLE
183-329-193
3123123
Is such an expression possible? Basically, it should find any string containing letters AND digits, regardless of whether the string contains a dash or underscore. I can find the first two using the following two regex:
/([A-Z][0-9])\w+/g
/([0-9][A-Z)\w+/g
But searching for possible dashes and hyphens makes it more complicated...
Thanks for any help you can provide! :)
MORE INFO:
I've made slight progress with: ([A-Z|a-z][0-9]+-*_*\w+) but it doesn't capture strings with more than one hyphen.
I had a document with a lot of text strings and number strings, which I don't want to capture. What I do want is any product code, which could be any length string with or without hyphens and underscores but will always include at least one digit and at least one letter.

You can use the following expression with the case-insensitive mode:
\b((?:[a-z]+\S*\d+|\d\S*[a-z]+)[a-z\d_-]*)\b
Explanation:
\b # Assert position at a word boundary
( # Beginning of capturing group 1
(?: # Beginning of the non-capturing group
[a-z]+\S*\d+ # Match letters followed by numbers
| # OR
\d+\S*[a-z]+ # Match numbers followed by letters
) # End of the group
[a-z\d_-]* # Match letter, digit, '_', or '-' 0 or more times
) # End of capturing group 1
\b # Assert position at a word boundary
Regex101 Demo

Match exactly 12 non-contiguous letters in Regex

I am trying to write some Regex that will match lines with exactly 12 letters (case-insensitive).
For instance, I want it to match 123124ab234cdef234gh1111ijkL (12 letters), but not abcdefgh1111ijk (11 letters) or abcdefgh1111ijkLM (13 letters). My thought was to do a nested lookahead twelve times:
(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z])(?=(.*[A-Za-z]).....))))
But this doesn't work. Neither does a simple twelve-letter match because the letters do not have to be conitguous:
[A-Za-z]{12}
Any help would be greatly appreciated. Thanks!

Here is a way:
^([^a-zA-Z]*[a-zA-Z]){12}[^a-zA-Z]*$
A quick break down:
^ # match the start of the input
( # start group 1
[^a-zA-Z]* # match zero or more non-letter chars
[a-zA-Z] # match one letter
){12} # end group 1 and match exactly 12 times
[^a-zA-Z]* # match zero or more non-letter chars
$ # match the end of the input
Note that [a-zA-Z] only matches the ASCII letters! The char 'É' wil not be matched by it. And therefor, [^a-zA-Z] does match 'É'.

Regular expression captures unwanted string

I have created the following expression: (.NET regex engine)
((-|\+)?\w+(\^\.?\d+)?)
hello , hello^.555,hello^111, -hello,+hello, hello+, hello^.25, hello^-1212121
It works well except that :
it captures the term 'hello+' but without the '+' : this group should not be captured at all
the last term 'hello^-1212121' as 2 groups 'hello' and '-1212121' both should be ignored
The strings to capture are as follows :
word can have a + or a - before it
or word can have a ^ that is followed by a positive number (not necessarily an integer)
words are separated by commas and any number of white spaces (both not part of the capture)
A few examples of valid strings to capture :
hello^2
hello^.2
+hello
-hello
hello
EDIT
I have found the following expression which effectively captures all these terms, it's not really optimized but it just works :
([a-zA-Z]+(?= ?,))|((-|\+)[a-zA-Z]+(?=,))|([a-zA-Z]+\^\.?\d+)

Ok, there are some issues to tackle here:
((-|+)?\w+(\^.?\d+)?)
^ ^
The + and . should be escaped like this:
((-|\+)?\w+(\^\.?\d+)?)
Now, you'll also get -1212121 there. If your string hello is always letters, then you would change \w to [a-zA-Z]:
((-|\+)?[a-zA-Z]+(\^\.?\d+)?)
\w includes letters, numbers and underscore. So, you might want to restrict it down a bit to only letters.
And finally, to take into consideration of the completely not capturing groups, you'll have to use lookarounds. I don't know of anyway otherwise to get to the delimiters without hindering the matches:
(?<=^|,)\s*((-|\+)?[a-zA-Z]+(\^\.?\d+)?)\s*(?=,|$)
EDIT: If it cannot be something like -hello^2, and if another valid string is hello^9.8, then this one will fit better:
(?<=^|,)\s*((?:-|\+)?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)(?=\s*(?:,|$))
And lastly, if capturing the words is sufficient, we can remove the lookarounds:
([-+]?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)

It would be better if you first state what it is you are looking to extract.
You also don't indicate which Regular Expression engine you're using, which is important since they vary in their features, but...
Assuming you want to capture only:
words that have a leading + or -
words that have a trailing ^ followed by an optional period followed by one or more digits
and that words are sequences of one or more letters
I'd use:
([a-zA-Z]+\^\.?\d+|[-+][a-zA-Z]+)
which breaks down into:
( # start capture group
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
\^ # literal
\.? # optional period
\d+ # one or more digits
| # OR
[+-]? # optional plus or minus
[a-zA-Z]+ # one or more letters or underscores
) # end of capture group
EDIT
To also capture plain words (without leading or trailing chars) you'll need to rearrange the regexp a little. I'd use:
([+-][a-zA-Z]+|[a-zA-Z]+\^(?:\.\d+|\d+\.\d+|\d+)|[a-zA-Z]+)
which breaks down into:
( # start capture group
[+-] # literal plus or minus
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
| # OR
[a-zA-Z]+ # one or more letters
\^ # literal
(?: # start of non-capturing group
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
) # end of non-capturing group
| # OR
[a-zA-Z]+ # one or more letters
) # end of capture group
Also note that, per your updated requirements, this regexp captures both true non-negative numbers (i.e. 0, 1, 1.2, 1.23) as well as those lacking a leading digit (i.e. .1, .12)
FURTHER EDIT
This regexp will only match the following patterns delimited by commas:
word
word with leading plus or minus
word with trailing ^ followed by a positive number of the form \d+, \d+.\d+, or .\d+
([+-][A-Za-z]+|[A-Za-z]+\^(?:.\d+|\d+(?:.\d+)?)|[A-Za-z]+)(?=,|\s|$)
Please note that the useful match will appear in the first capture group, not the entire match.
So, in Javascript, you'd:
var src="hello , hello ,hello,+hello,-hello,hello+,hello-,hello^1,hello^1.0,hello^.1",
RE=/([+-][A-Za-z]+|[A-Za-z]+\^(?:\.\d+|\d+(?:\.\d+)?)|[A-Za-z]+)(?=,|\s|$)/g;
while(RE.test(src)){
console.log(RegExp.$1)
}
which produces:
hello
hello
hello
+hello
-hello
hello^1
hello^1.0
hello^.1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to match strings - regex

Related

Regex get string after specific char, but only when the text starts with a specific string

Modify this yup validation to change max length to 9 if the string does not include a dash

Regex code , Python-2 alphanumeric [duplicate]

Match exactly 12 non-contiguous letters in Regex

Regular expression captures unwanted string

Categories

Resources