Regular expression matching specific letter combos - regex

I need to match the following example strings:
LA20517505
BN30116471
I tried this: [LA|BN].\d{8}
That does indeed match, but it also matches other letters as well. I specifically need to match "LA" or "BN" followed by 8 numbers.

Don't use brackets here but parenthesis : (LA|BN)\d{8}
Explanation:
(LA|BN) Match character sequences LA or BN
\d{8} followed by 8 digits
whereas the initial regex [LA|BN].\d{8} can be read as :
[LA|BN] Match either character L,A,|,B or N
. Match any character
\d{8} followed by 8 digits

Related

Regex with wildcard search?

I created a Regex to check a string for the following situation:
first 4 chars are numbers
following by a point
following by 3 numbers
following by a point
following by 4 to 8 numbers or letters
ie: 1234.123.125B
My Regex: ^[0-9]{4}[.][0-9]{3}[.][0-9a-zA-Z]{4,8}$
But now I need a wildcard search: The Regex should also match if there is a '*' after the first 8 characters. For example:
1234.123.12* MATCH
1234.123* MATCH
1234.123.45B9* MATCH
1234.12* NO MATCH
1234.12345* NO MATCH
How can I add the wildcard search to my Regex?
Thank you
You may use this regex with alternation:
^\d{4}\.\d{3}(?:\*|\.[\da-zA-Z]{0,7}\*|\.[\da-zA-Z]{4,8})$
RegEx Demo
RegEx Details:
^: Start
\d{4}\.\d{3}: Match 4 digits + 1 dot + 3 digits
(?:\*|\.[\da-zA-Z]{0,7}\*|\.[\da-zA-Z]{4,8}): matches a single * OR a * after after a dot and 0 to 7 digits/letters OR match 4 to 8 digits/letters
$: End
My assumptions are that:
You don't allow wildcards to be mid-string
Nor do you want to allow wildcards after the full pattern (e.g.: 1234.123.12345678*).
So, alternatively you may possibily use something like:
^\d{4}\.\d{3}(?!.*\*.)(?![^*]{0,4}$)[.*][*\da-zA-Z]{0,8}$
See the online demo.
^ - Start string ancor.
\d{4}\.\d{3} - Four digits, a dot and another three digits.
(?!.*\*.) - Negative lookahead for zero or more characters followed by asterisk and another character other than newline.
(?![^*]{0,4}$) - Negative lookahead for zero to four characters other than asterisk before end string ancor.
[.*] - A literal dot or asterisk.
[*\da-zA-Z]{0,8} - Zero to eight characters from the character class.
$ - End string ancor.

Regex to match unique characters NOT in a set

I'd like to match unique characters that are NOT "ymd"
example 1 :
mm-dd-yyyy should match only 1 character -
example 2 :
d. m. y. should match only 1 . character and 1 whitespace character
I've tried negative lookahead using this pattern
/([^ymd]+\b)(?!.*\1\b)/
which works, but the match for the example 2 is ". "
Ideally, I'd like it to find 2 single character matches : "." and 1 whitespace character
First, simply match single characters. Be sure to put them in a group. This will make all non-ymd characters match individually:
([^ymd])
Then, use a negative lookahead. This will make only the last unique character match:
(?!.*\1)
Full solution:
([^ymd])(?!.*\1)
See it live!

Regular expression - starting with 3 alphanumeric characters which includes at least one letter and one number, and ending with a letter

I'm trying to make a regex that matches the following criteria:
4 characters.
The beginning 3 characters must be alphanumeric characters, including at least one letter and one digit.
The last character must be a letter.
So I expect the results would be:
case1: abcd -> no match
case2: 234d -> no match
case3: a23c -> match
case4: 3abc -> match
case5: xy23 -> no match
I tested the following regex which matches criteria 2, but still cannot find a solution to match criteria 1&3.
^(?!.*[^a-zA-Z0-9])(?=.*\d)(?=.*[a-zA-Z]).{3}$
I tried this one but it failed on case2.
^(?!.*[^a-zA-Z0-9])(?=.*\d)(?=.*[a-zA-Z]).{3}[a-zA-Z]$
How can I combine these criteria? Thanks!
You may use
^(?=.{0,2}[0-9])(?=.{0,2}[a-zA-Z])[0-9a-zA-Z]{3}[a-zA-Z]$
See the regex demo
Details
^ - start of string
(?=.{0,2}[0-9]) - there must be an ASCII digit after 0 to 2 chars
(?=.{0,2}[a-zA-Z])- there must be an ASCII letter after 0 to 2 chars
[0-9a-zA-Z]{3} - 3 ASCII alphanumerics
[a-zA-Z] - an ASCII letter
$ - end of string
No need to use complicated features for 3 or 4 characters:
/^(?:[a-z0-9](?:[0-9][a-z]|[a-z][0-9])|[0-9][a-z]{2}|[a-z][0-9]{2})[a-z]$/i
or
/^(?:[a-z](?:[0-9][a-z0-9]|[a-z][0-9])|[0-9](?:[a-z][a-z0-9]|[0-9][a-z]))[a-z]$/i

Regex for text (string and numbers) between Pipes

I have this scenario:
Ex1:
Valid:
12345678|abcdefghij|aaaaaaaa
Invalid:
12345678|abcdefghijk|aaaaaaaaa
Which means that between pipes the maximum length is 8. How can I make in the regex?
I put this
^(?:[^|]+{0,7}(?:\|[^|]+)?$ but it´s not working
Try the following pattern:
^.{1,8}(?:\|.{1,8})*$
The basic idea is to match between one and eight characters, followed by | and another 1 to 8 characters, that term repeated zero or more times. Explore the demo with any data you want to see how it works.
Sample data:
123
12345678
abcdefghi (no match)
12345678|abcdefgh|aaaaaaaa
12345678|abcdefghijk|aaaaaaaaa (no match)
Demo here:
Regex101
When you want to match delimited data, you should refrain from using plain unrestricted .. You need to match parts between |, so you should consider [^|] negated character class construct that matches any char but |.
Since you need to limit the number of the pattern occurrences of the negated character class, restrict it with a limiting quantifier {1,8} that matches 1 to 8 consecutive occurrences of the quantified subpattern.
Use
^[^|]{1,8}(?:\|[^|]{1,8})*$
See the regex demo.
Details
^ - start of a string
[^|]{1,8} - any 1 to 8 chars other than |
(?:\|[^|]{1,8})* - 0 or more consecutive sequences of:
\| - a literal pipe symbol
[^|]{1,8} - any 1 to 8 chars other than |
$ - end of string.
Then, the [^|] can be restricted further as per requirements. If you only need to validate a string that has ASCII letters, digits, (, ), +, ,, ., /, :, ?, whitespace and -, you need to use
^[A-Za-z0-9()+,.\/:?\s-]{1,8}(?:\|[A-Za-z0-9()+,.\/:?\s-]{1,8})*$
See another regex demo.

extract substring with regular expression

I have a string, actually is a directory file name.
str='\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3'
I need to extract the target substring 'UA0001A' with matlab (well I would like think all tools should have same syntax).
It does not necessary to be exact 'UA0001A', it is arbitrary alphabet-number combination.
To make it more general, I would like to think the substring (or the word) shall satisfy
it is a alphabet-number combination word
it cannot be pure alphabet word or pure number word
it cannot include 'midd' or 'midd3' or 'Midd3' or 'MIDD3', etc, so may use case-intensive method to exclude word begin with 'midd'
it cannot include 'y[0-9]{2,4}m[0-9]{1,2}d[0-9]{1,2}\w*'
How to write the regular expression to find the target substring?
Thanks in advance!
You can use
s = '\\198.168.0.10\share\ccdfiles\UA-midd3-files\UA0001A_15_Jun_2014_08.17.49\Midd3\y12m05d25h03m16.midd3';
res = regexp(s, '(?i)\\(?![^\W_]*(midd|y\d+m\d+))(?=[^\W_]*\d)(?=[^\W_]*[a-zA-Z])([^\W_]+)','tokens');
disp(res{1}{1})
See the regex demo
Pattern explanation:
(?i) - the case-insensitive modifier
\\ - a literal backslash
(?![^\W_]*(midd|y\d+m\d+)) - a negative lookahead that will fail a match if there are midd or y+digits+m+digits after 0+ letters or digits
(?=[^\W_]*\d) - a positive lookahead that requires at least 1 digit after 0+ digits or letters ([^\W_]*)
(?=[^\W_]*[a-zA-Z]) - there must be at least 1 letter after 0+ letters or digits
([^\W_]+) - Group 1 (what will extract) matching 1+ letters or digits (or 1+ characters other than non-word chars and _).
The 'tokens' "mode" will let you extract the captured value rather than the whole match.
See the IDEONE demo
this should get you started:
[\\](?i)(?!.*midd.*)([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*)
[\\] : match a backslash
(?i) : rest of regex is case insensitive
?! following match can not match this
(?!.*midd.*) : following match can not be a word wich has any character, midd, any character
([a-z]+[0-9]+[a-z0-9]*|[a-z]+[0-9]+[a-z0-9]*) match at least one number followed by at least one letter OR at least one letter followed by at least one number followed by any amount of letters and numbers (remember, cannot match the ?! group so no word which contains mid )