REGEX Repeater "Or" Operator - regex

I am looking to match a regex with either 2 [0-9] repeats (and then some other pattern)
[0-9]{2}[A-z]{4}
OR 6 [0-9] repeats (and then some other pattern)
[0-9]{6}[A-z]{4}
The following is too inclusive:
[0-9]{2,6}[A-z]{4}
QUESTION
Is there a way that I can specify either 2 or 6 repeats?

You can use the or | like this within a non-capturing group:
(?:[0-9]{2}|[0-9]{6})[A-z]{4}
Be aware that using [A-z] doesn't only include lower and upper case letters, but also [, \, ], ^, _, and ' which lie between Z and a in the ASCII code points. Use [A-Za-z] for letters, as pointed out by #AlanMoore in his comment.

This should work
(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}
Do you have some test cases I can verify it with.
12asdf - passes
123456asdf - passes
1234asdf - fails
However, if you don't anchor the start of the regex to a word (\b) or line boundary (^), the 1234asdf will have 34asdf as a partial match.
So either
\b(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}
or
^(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}
As a quick rundown of the regex changes
(?: ) creates a non capturing group
| selects between the alteratives [0-9]{2} and [0-9]{6}
^ matches the start of a line
$ matches the end of a line
\b matches a word boundary
[a-zA-Z] is being used instead of [A-z] as it's likely what was intended (all alpha characters, regardless of case)
You can also replace your [0-9]s with a \d which is shorthand for any digit. The best way I can think of to right this, and not get partial matches is as follows
(?:\b|^)(?:\d{2}|\d{6})[a-zA-Z]{4}(?:\b|$)

The classic way would be:
(?:[0-9]{2}|[0-9]{6})[A-z]{4}
[Literally as [0-9]{2} OR [0-9]{6}]
But you can also use this one, which should be a little more efficient than the above with less potential backtracking:
[0-9]{2}(?:[0-9]{4})?[A-z]{4}
[Here, [0-9]{2} then potential other 4 [0-9] which makes a total of 6 [0-9] in the required conditions]
You might not be aware that [A-z] matches letters and some other characters, but it actually does.
The range [A-z] effectively is equivalent to:
[A-Z\[\\\]^_`a-z]
Notice that the additional characters that match are:
[ \ ] ^ _ `
[spaces included voluntarily for separation but is not part of the characters]
This is because those characters are between the block letters and lowercase letters in the unicode table.

Not obvious, but yes:
(?:\d{2}|\d{6})

Related

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.
You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo
The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});
Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$
Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line
^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

extract substring with regular expression

I have a string, actually is a directory file name.
str='\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3'
I need to extract the target substring 'UA0001A' with matlab (well I would like think all tools should have same syntax).
It does not necessary to be exact 'UA0001A', it is arbitrary alphabet-number combination.
To make it more general, I would like to think the substring (or the word) shall satisfy
it is a alphabet-number combination word
it cannot be pure alphabet word or pure number word
it cannot include 'midd' or 'midd3' or 'Midd3' or 'MIDD3', etc, so may use case-intensive method to exclude word begin with 'midd'
it cannot include 'y[0-9]{2,4}m[0-9]{1,2}d[0-9]{1,2}\w*'
How to write the regular expression to find the target substring?
Thanks in advance!
You can use
s = '\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3';
res = regexp(s, '(?i)\\(?![^\W_]*(midd|y\d+m\d+))(?=[^\W_]*\d)(?=[^\W_]*[a-zA-Z])([^\W_]+)','tokens');
disp(res{1}{1})
See the regex demo
Pattern explanation:
(?i) - the case-insensitive modifier
\\ - a literal backslash
(?![^\W_]*(midd|y\d+m\d+)) - a negative lookahead that will fail a match if there are midd or y+digits+m+digits after 0+ letters or digits
(?=[^\W_]*\d) - a positive lookahead that requires at least 1 digit after 0+ digits or letters ([^\W_]*)
(?=[^\W_]*[a-zA-Z]) - there must be at least 1 letter after 0+ letters or digits
([^\W_]+) - Group 1 (what will extract) matching 1+ letters or digits (or 1+ characters other than non-word chars and _).
The 'tokens' "mode" will let you extract the captured value rather than the whole match.
See the IDEONE demo
this should get you started:
[\\](?i)(?!.*midd.*)([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*)
[\\] : match a backslash
(?i) : rest of regex is case insensitive
?! following match can not match this
(?!.*midd.*) : following match can not be a word wich has any character, midd, any character
([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*) match at least one number followed by at least one letter OR at least one letter followed by at least one number followed by any amount of letters and numbers (remember, cannot match the ?! group so no word which contains mid )

I need an unique regex that requires at least on letter and disallows + and any form of blank space

I broke it down to two, but I'm wondering if it's possible in one.
My two regex
/^[^\s+ ]+$/
/(.*[a-zA-Z].*)/
You can use
/^[^+\s]*[a-z][^+\s]*$/i
See the regex demo
The pattern matches:
^ - start of string
[^+\s]* - zero or more characters other than + and whitespace
[a-z] - a letter (case insensitive - see /i modifier)
[^+\s]* - zero or more characters other than + and whitespace
$ - end of string
This expressions only requires one letter, and there can be any number of characters other than a space and a plus on both sides of the letter.
Try this. I'm not sure what you mean by "unique", though:
/^[^+\s]*[A-Za-z][^+\s]*$/
Why not both?
^(?=.*[a-zA-Z])[^\s+]+$
Uses lookahead.
^(?=.*[a-zA-Z])[^\s+]+$
^ start of regex
(?=.*[a-zA-Z]) make sure there is at least a letter ahead
[^\s+]+ make every character is not a plus or any whitespace character
$ end of regex
Notice how I changed your [^\s+ ] into my [^\s+] because \s already included the space (U+0020).

How do I check a whole existing regular expression for a digit?

I have written a regular expression as follows:
"^[\+]{0,1}([\#]|[\*]|[\d]){1,15}$"
In summary this matches an optional '+' sign followed by up to 15 characters which might be '#', '*' or a digit.
However, this means that '+#' will match and this is not a valid result as I always need at least one number.
Typical valid matches might be:
+1234
445678999
+#7897897
+345764756#775
So, given that I've crafted a valid RegEx for these to match, I guess the elegant solution is to use this regex and add some special criterion to globally check for a digit in the result OR somehow disallow anything which doesn't have at least one digit in.
How do I check for that digit?
This solutions requires at least one digit in the string, using lookahead (the (?=...) section):
^(?=.*\d)\+?[#*\d]{1,15}$
Legenda
^ # Start of the string (or line with m/multiline flag)
(?=.*\d) # Lookahead that checks for at least one digit in the match
\+? # An optional literal plus '+'
[#*\d]{1,15} # one to fifteen of literal '#' or '*' or digit (\d is [0-9])
$ # End of the string (line with m/multiline flag)
Online Demo
Regex graphical schema (everybody loves it)
NOTE: as you can see in the demo avoid also combinations just like +* or + or #* , you get it...
Try this regex (my first idea initially):
^(?=.*[0-9])[+]?([#*\d]{1,15})$
You can replace [0-9] with \d.
DEMO:
https://regex101.com/r/bM9oE6/3
I'd use
^(?=.*\d)\+?[#*\d]{1,15}$
Explanation:
^ : begining of line
(?= : lookahead
.*\d : at least one digit
)
\+? : optional +
[#*\d]{1,15} : 1 to 15 character in class [#*\d]
$ : end of line
matched:
+1234
445678999
+#7897897
+345764756#775
###456
not matched:
+#*
+*
#*
+#
This should work in your case:
^(\+{0,1}[\d#]{1,15})$
Demo:
https://regex101.com/r/fU1eC2/1
Edit:
If you need # after + in string use ^[+#]?([\d#]{1,15})(?<!#)$
matches "+#7897897"
If don't, use ^[+#]*([\d#]{1,15})(?<!#)$
matches "+#7897897"

Decyphering a simple regex

The regular expression in question is
(\d{3,4}[.-]?)+
sample text
707-7019-789
My progress so far
( )+ a capturing group, capturing one or more
\d{3,4} digit, in quantities 3 or 4
[.-]? dot (or something) or hyphen, in quantities zero or one <-- this is the part I'm interested in
From my understanding this should match 3 or 4 digit number, followed by a dot (or anything, since dot matches anything) or a hyphen, bundled in a group, one or more times. Why doesn't this matches a
707+123-4567
then?
. in a character group [] is just a literal ., it does not have the special meaning "anything". [.-]? means "a dot or a hyphen or nothing", because the entire group is made optional with the ?.
[.-]?
What this means literally:
character class [.-]
Match only one out of the following characters: . and - literally.
lazy quantifier ?
Repeat the last token between 0 and 1 times, as few times as possible.
The brackets remove the functionality of the dot.
Brackets mean "Range"/"Character class".
Thus you are saying Choose from the list/range/character class .-
You aren't saying choose from the list "anything"- (anything is the regular meaning of .)