Regular Expression containing letters and spaces in specified fashion - regex

I am working on a text processing Api in java. I need to match the strings which are:
At least 8 characters in length.
Should only contain uppercase letters, lowercase letters or spaces.
Spaces should not be present in between the letters. They can however be leading or trailing. The String can also contain only spaces which are at least 8.
Regular expression which I tried but failed:
^\s*[a-zA-Z]{8,}\s*$
Demo of my tries in here.
Any help will be welcomed.

You can use the below regex to achieve your result:
^(?=.{8,}) *[a-zA-Z]* *$
Explanation of the above regex:
^ - denotes start of the test String.
(?=) - Positive lookahead.
.{8,} - any character other than newline with length at least 8.
* - 0 or more spaces in order to match the leading spaces.(\s is avoided)
[a-zA-Z]* - 0 or more letters (uppercase or lowercase). (You can use [a-z]* along with i(case insensitive) flag. Although, there will be no effect on performance.)
* - 0 or more spaces in order to match the trailing spaces.(\s is avoided)
$ - denotes end of the test String.
Above regex demo.

Related

Regex How can I find names only written in upper-case letters(mandatory), and the names may also contain numbers, literal spaces, and hyphens

If I want to find "KFC", "EU 8RF", and IK-OTP simultaneously, what should the code look like?
My code is :
db.business.find({name:/^[A-Z\s?\d?\-?]*$/}, {name:1}).sort({"name":1})
but it will return the name that is whole number, such as 1973, 1999. How should I improve my code? TIA
Use a lookahead to require at least one letter.
^(?=.*[A-Z])[A-Z\d\s-]+$
DEMO
You can use
^(?=[\d -]*[A-Z])[A-Z\d]+(?:[ -][A-Z\d]+)*$
See the regex demo.
Details:
^ - start of string
(?=[\d -]*[A-Z]) - a positive lookahead that requires an uppercase ASCII letter after any zero or more digits, spaces or hyphens immediately to the right of the current location
[A-Z\d]+ - one or more uppercase ASCII letters or digits
(?:[ -][A-Z\d]+)* - zero or more repetitions of a space or - and then one or more uppercase ASCII letters or digits
$ - end of string.

Extend regular expression

I want to find invoice numbers with a regex. The string has be longer than 3 char. It may contain signs like {., , /, _}, all numbers and it may contain one or two capital letters - those can stay alone or after each other. That is, what I'm currently trying, without success.
`([0-9-\.\\\/_]{,3})([A-Z]{0,2})?`
Here I have two examples, which should be matched:
019S836/03717008
DR094255
This should not be matched:
DRF094255
Can somebody help me please?
You can use
^(?!(?:[^A-Z]*[A-Z]){3})(?=\D*\d)[0-9A-Z.\\\/_-]{3,}$
See the regex demo.
Details:
^ - start of string
(?!(?:[^A-Z]*[A-Z]){3}) - a negative lookahead that fails the match if, immediately to the right of the current location (i.e. from the start of string), there are three occurrences of any zero or more chars other than uppercase ASCII letters followed with one uppercase ASCII letter
(?=\D*\d) - there must be at least one digit in the string
[0-9A-Z.\\\/_-]{4,} - four or more occurrences of digits, uppercase letters, ., \, /, _ or -
$ - end of string.

Negating a complex regex containing three parts

I need a regex which is matched when the string doesn't have both lowercase and uppercase letters.
If the string has only lowercase letters -> should be matched
If the string has only uppercase letters -> should be matched
If the string has only digits or special characters -> should be matched
For example
abc, ABC, 123, abc123, ABC123&^ - should match
AbC, A12b, AB^%12c - should not match
Basically I need an inverse/negation of the following regex:
^(?=.*[a-z])(?=.*[A-Z]).+$
Does not sound like any lookarounds would be needed.
Either match only characters that are not a-z or only characters, that are not A-Z.
^(?:[^a-z]+|[^A-Z]+)$
See this demo at regex101 (used + for one or more)
You may use
^(?!.*[A-Z].*[a-z])(?!.*[a-z].*[A-Z])\S+$
Or
^(?=(?:[^a-z]+|[^A-Z]+)$).*$
See the regex demo #1 and regex demo #2
A lookaround solution like this can be used in more complex scenarios, when you need to apply more restrictions on the pattern. Else, consider a non-lookaround solution.
Details
^ - start of string
(?!.*[A-Z].*[a-z]) - no uppercase followed with a lowercase letter
(?!.*[a-z].*[A-Z]) - no lowercase letter followed with an uppercase one
(?=(?:[^a-z]+|[^A-Z]+)$) - a positive lookahead that requires 1 or more characters other than lowercase ASCII letters ([^a-z]+) to the end of the string, or 1 or more characters other than uppercase ASCII letters ([^A-Z]+) to the end of the string
.+ - 1+ chars other than line break chars
$ - end of string.
You can use this regex
^(([A-Z0-9?&%^](?![a-z]))+|([a-z0-9?&%^](?![A-Z]))+)$
You can test more cases here.
I've only added the characcter ?&%^ as possible character, but you could add which ever you like.
I would go with:
^(?:[^a-z]+?|[^A-Z]+?)$
It translates to "If the entire string is composed of non-lowercase letters or non-uppercase letters then match the string."
Lazy quantifiers +? are used so that the end-string $ anchor is obeyed when the multiline flag is enabled. If you're only validating a single-line string the you can simply use + without the question mark.
If you have a whitelist of specific allowed special chars then change [^A-Z] into [A-Z0-9()_+=-] and list the allowed special chars.
https://regex101.com/r/Wg6tLn/1

Regex - Allow Alphanumeric, spaces and symbols but CANNOT be only numeric or symbols or spaces. Must have alphabets in it

How can I write this rules using regex :
Allow :
- Alphanumeric
- Spaces
- Symbols
Required :
Alphabetical
A minimum of two of the allowed
For example :
"abc123--" is accepted string
"abc" is rejected
"123-9*" is rejected
The comment that now you have 2 problems was a bit malevolent.
Regular expressions are just the right solution to verify such things,
under condition that you know how to do it.
A general rule to verify a text for presence / absence of particular chars is:
Start from ^ anchor.
Put a number of positive / negative lookaheads, verifying all criteria but the last.
Put "ordinary" regex expression, trying to match the last criterion.
End with $ anchor.
From what you decribed as acceptable / unacceptable strings, I see that you
have additional requirement: The string must contain al least 1 digit
(because you described abc as unacceptable).
So the regex should contain the following parts:
^ - Start anchor.
(?=(?:.*[a-z]){2,}) - Positive lookup for a letter, after 0 or more any chars
(i.e. somewhere in the string), 2 times or more.
(?=.*\d) - Positive lookup for a digit, after 0 or more any chars.
[\w!##$%^&*+;:,.-]+ - Specification of what you want to
match - "allowed" chars, occurring 1 or more times.
If you need any more punctation characters, just add them here.
Note that - is at the end, otherwise you should have quoted it with
a backslash. Other chars (e.g. ., * and +) need no quotation
between [ and ] (they represent just themselves).
$ - End anchor.
Note that \w covers letters, digits and _.
To sum up, the whole regex is:
^(?=(?:.*[a-z]){2,})(?=.*\d)[\w!##$%^&*+;:,.-]+$
Of course, use it with i (case insensitive) option.

extract substring with regular expression

I have a string, actually is a directory file name.
str='\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3'
I need to extract the target substring 'UA0001A' with matlab (well I would like think all tools should have same syntax).
It does not necessary to be exact 'UA0001A', it is arbitrary alphabet-number combination.
To make it more general, I would like to think the substring (or the word) shall satisfy
it is a alphabet-number combination word
it cannot be pure alphabet word or pure number word
it cannot include 'midd' or 'midd3' or 'Midd3' or 'MIDD3', etc, so may use case-intensive method to exclude word begin with 'midd'
it cannot include 'y[0-9]{2,4}m[0-9]{1,2}d[0-9]{1,2}\w*'
How to write the regular expression to find the target substring?
Thanks in advance!
You can use
s = '\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3';
res = regexp(s, '(?i)\\(?![^\W_]*(midd|y\d+m\d+))(?=[^\W_]*\d)(?=[^\W_]*[a-zA-Z])([^\W_]+)','tokens');
disp(res{1}{1})
See the regex demo
Pattern explanation:
(?i) - the case-insensitive modifier
\\ - a literal backslash
(?![^\W_]*(midd|y\d+m\d+)) - a negative lookahead that will fail a match if there are midd or y+digits+m+digits after 0+ letters or digits
(?=[^\W_]*\d) - a positive lookahead that requires at least 1 digit after 0+ digits or letters ([^\W_]*)
(?=[^\W_]*[a-zA-Z]) - there must be at least 1 letter after 0+ letters or digits
([^\W_]+) - Group 1 (what will extract) matching 1+ letters or digits (or 1+ characters other than non-word chars and _).
The 'tokens' "mode" will let you extract the captured value rather than the whole match.
See the IDEONE demo
this should get you started:
[\\](?i)(?!.*midd.*)([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*)
[\\] : match a backslash
(?i) : rest of regex is case insensitive
?! following match can not match this
(?!.*midd.*) : following match can not be a word wich has any character, midd, any character
([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*) match at least one number followed by at least one letter OR at least one letter followed by at least one number followed by any amount of letters and numbers (remember, cannot match the ?! group so no word which contains mid )