Regex matching of character class and special conditions on certain other conditions - regex

I want to match a section of a string that contains certain characters repeated, along with certain other characters only given a certain criteria. For instance matching characters a-z contained in angle brackets and numbers only if the number is preceeded by a plus.
Matching <abcde> to abcde.
<abcde1> should not match anything.
Matching <abcde+1> to abcde+1
Matching <abcde+1asd+2+3+4as> to abcde+1asd+2+3+4as
<abcde+> should not match anything.
The regex I've tried is <([a-z]|(\+(?=[0-9])|[0-9](?<=[\+])))*>.

You can use
(?<=<)(?:[a-zA-Z]+(?:\+\d+)*)+[a-zA-Z]*(?=>)
<((?:[a-zA-Z]+(?:\+\d+)*)+[a-zA-Z]*)>
See the regex demo. Details:
(?<=<) - a positive lookbehind that requires a < char immediately on the left
(?:[a-zA-Z]+(?:\+\d+)*)+ - one or more occurrences of
[a-zA-Z]+ - one or more letters
(?:\+\d+)* - zero or more sequences of + and one or more digits
[a-zA-Z]* - one or more ASCII letters
(?=>) - a positive lookahead that requires a > char immediately on the right.

Related

Regex To Validate A String, But The String Can't Contain n Number Of A Specific Character

Recently I ran into a validation situation I've been trying to solve with regex. The rules are as such:
Must start with a capital letter
Center of the string may be of any length
Center of the string may have any combination of upper and lower case letters and numbers
Center of the string may have up to one underscore
Must end with a number
I have attempted to match this string with the following regex:
^(?!_{2,})([A-Z][a-zA-Z0-9_]*[0-9])$
and
^(?<=_{0,1})([A-Z][a-zA-Z0-9_]*[0-9])$
Both of these attempts still match cases where there is more than one underscore present. I.E. App_l_e9 or App__le9.
How can you check to see if your regex match, I.E. the ([A-Z][a-zA-Z0-9_]*[0-9]) part contains zero or one underscore in any place within the middle of the string?
The simplest approach would probably be this
^[A-Z][a-zA-Z0-9]*_?[a-zA-Z0-9]*[0-9]$
Explanation:
^[A-Z] Must start with an uppercase letter
[a-zA-Z0-9]* A combination of uppercase and lowercase letters and numbers of any length (also 0-length)
_? Either zero or one underscore character
[a-zA-Z0-9]* Again A combination of uppercase and lowercase letters and numbers of any length (also 0-length)
[0-9]$ Must end with a number
This will accept A_9 or AA0_xY8 but for instance not aXY_34 or Aasf1__asdf5
If the underscore in the middle part must not be the first or last character of this middlepart, you can replace the * with a + like this.
^[A-Z][a-zA-Z0-9]+_?[a-zA-Z0-9]+[0-9]$
So this, won't accecept for instance A_9 anymore, but the word must at least be Ax_d9
You might also start the match with an uppercase A-Z and immediately check that the string ends with a number 0-9 using a positive lookahead to prevent catastrophic backtracking.
^[A-Z](?=.*[0-9]$)[a-zA-Z0-9]*_?[a-zA-Z0-9]*$
^ Start of string
[A-Z] Match an uppercase char A-Z
(?=.*[0-9]$) Positive lookahead to assert a digit 0-9 at the end of the string
[a-zA-Z0-9]* Optionally match any of the listed
_? Match an optional _
[a-zA-Z0-9]* Optionally match any of the listed
$ End of string
Regex demo
Or with an optional group
^[A-Z](?=.*[0-9]$)[a-zA-Z0-9]*(?:_[a-zA-Z0-9]*)?$
Regex demo

Extend regular expression

I want to find invoice numbers with a regex. The string has be longer than 3 char. It may contain signs like {., , /, _}, all numbers and it may contain one or two capital letters - those can stay alone or after each other. That is, what I'm currently trying, without success.
`([0-9-\.\\\/_]{,3})([A-Z]{0,2})?`
Here I have two examples, which should be matched:
019S836/03717008
DR094255
This should not be matched:
DRF094255
Can somebody help me please?
You can use
^(?!(?:[^A-Z]*[A-Z]){3})(?=\D*\d)[0-9A-Z.\\\/_-]{3,}$
See the regex demo.
Details:
^ - start of string
(?!(?:[^A-Z]*[A-Z]){3}) - a negative lookahead that fails the match if, immediately to the right of the current location (i.e. from the start of string), there are three occurrences of any zero or more chars other than uppercase ASCII letters followed with one uppercase ASCII letter
(?=\D*\d) - there must be at least one digit in the string
[0-9A-Z.\\\/_-]{4,} - four or more occurrences of digits, uppercase letters, ., \, /, _ or -
$ - end of string.

Regular Expression containing letters and spaces in specified fashion

I am working on a text processing Api in java. I need to match the strings which are:
At least 8 characters in length.
Should only contain uppercase letters, lowercase letters or spaces.
Spaces should not be present in between the letters. They can however be leading or trailing. The String can also contain only spaces which are at least 8.
Regular expression which I tried but failed:
^\s*[a-zA-Z]{8,}\s*$
Demo of my tries in here.
Any help will be welcomed.
You can use the below regex to achieve your result:
^(?=.{8,}) *[a-zA-Z]* *$
Explanation of the above regex:
^ - denotes start of the test String.
(?=) - Positive lookahead.
.{8,} - any character other than newline with length at least 8.
* - 0 or more spaces in order to match the leading spaces.(\s is avoided)
[a-zA-Z]* - 0 or more letters (uppercase or lowercase). (You can use [a-z]* along with i(case insensitive) flag. Although, there will be no effect on performance.)
* - 0 or more spaces in order to match the trailing spaces.(\s is avoided)
$ - denotes end of the test String.
Above regex demo.

Regular Expression that matches when any string has minimum three characters, and + signs are surrounded by minimum three characters

I wanted to create regex expression that only matches when any string has three or more character and if any + sign in the string then after and before + sign it must be minimum three characters required,
I have created one regex it fulfills me all requirement except one that before first + sign must be minimum three characters but it matches with less character
this is my current regex: (\+[a-z0-9]{3}|[a-z0-9]{0,3})$
ab+abx this string should not match but it matched in my regex
Example:
Valid Strings:
sss
sdfsgdf
4534534
dfs34543
sdafds+3232+sfdsafd
qwe+sdf
234+567
cvb+243
Invalid Strings:
a
aa
a+
aa+
+aa
+a
a+a
aa+aa
aaa+a
You can use this regex,
^[^+\n]{3,}(?:\+[^+\n]{3,})*$
Explanation:
^ - Start of string
[^+\n]{3,} - This ensures it matches any characters except + and newline, \n you can actually remove if the input you're trying to match doesn't contain any newlines and {3,} allows it to match at least three and more characters
(?:\+[^+\n]{3,})* - This part further allows matching of a + character then further separated by at least three or more characters and whole of it zero or more times to keep appearance of + character optional
$ - End of input
Demo
Edit: Updating solution where a space does not participate in counting the number of characters in either side of + where minimum number of character required were three
You can use this regex to ignore counting spaces within the text,
^(?:[^+\n ] *){3,}(?:\+ *(?:[^+\n ] *){3,})*$
Demo
Also, in case you're dealing with only alphanumeric text, you can use this simpler and easier to maintain regex,
^(?:[a-z0-9] *){3,}(?:\+ *(?:[a-z0-9] *){3,})*$
Demo
You could repeat 0+ times matching 3 or more times what is listed in the character class [a-z0-9] preceded by a plus sign:
^[a-z0-9]{3,}(?:\+[a-z0-9]{3,})*$
That will match:
^ Start of string
[a-z0-9]{3,} Match 3+ times what is listed in the character class
(?: Non capturing group
\+[a-z0-9]{3,} Match + sign followed by matching 3+ times what is listed in the character class
)* Close group and repeat 0+ times
$ End of string

extract substring with regular expression

I have a string, actually is a directory file name.
str='\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3'
I need to extract the target substring 'UA0001A' with matlab (well I would like think all tools should have same syntax).
It does not necessary to be exact 'UA0001A', it is arbitrary alphabet-number combination.
To make it more general, I would like to think the substring (or the word) shall satisfy
it is a alphabet-number combination word
it cannot be pure alphabet word or pure number word
it cannot include 'midd' or 'midd3' or 'Midd3' or 'MIDD3', etc, so may use case-intensive method to exclude word begin with 'midd'
it cannot include 'y[0-9]{2,4}m[0-9]{1,2}d[0-9]{1,2}\w*'
How to write the regular expression to find the target substring?
Thanks in advance!
You can use
s = '\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3';
res = regexp(s, '(?i)\\(?![^\W_]*(midd|y\d+m\d+))(?=[^\W_]*\d)(?=[^\W_]*[a-zA-Z])([^\W_]+)','tokens');
disp(res{1}{1})
See the regex demo
Pattern explanation:
(?i) - the case-insensitive modifier
\\ - a literal backslash
(?![^\W_]*(midd|y\d+m\d+)) - a negative lookahead that will fail a match if there are midd or y+digits+m+digits after 0+ letters or digits
(?=[^\W_]*\d) - a positive lookahead that requires at least 1 digit after 0+ digits or letters ([^\W_]*)
(?=[^\W_]*[a-zA-Z]) - there must be at least 1 letter after 0+ letters or digits
([^\W_]+) - Group 1 (what will extract) matching 1+ letters or digits (or 1+ characters other than non-word chars and _).
The 'tokens' "mode" will let you extract the captured value rather than the whole match.
See the IDEONE demo
this should get you started:
[\\](?i)(?!.*midd.*)([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*)
[\\] : match a backslash
(?i) : rest of regex is case insensitive
?! following match can not match this
(?!.*midd.*) : following match can not be a word wich has any character, midd, any character
([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*) match at least one number followed by at least one letter OR at least one letter followed by at least one number followed by any amount of letters and numbers (remember, cannot match the ?! group so no word which contains mid )