RegEx match everything between 2 characters only if at least 3 characters - regex

I am trying to match all characters between "xAA" and "xFF" but only if theres at least 4 characters between those. Is there a simple way to do this with RegEx?
For example, xAA-12345-xFF should be matched but xAA-1-xFF should be ignored.
My RegEx currently looks like this:
"(?<=\xAA).*(?=\xFF)"
this does match everything between those characters, but i can't find a way on how to only match if at least 4 characters between them, can someone help me?

Quick and dirty hack could be:
"xAA....*xFF"
This requires three (...) characters followed by 0 or more (.*).

Do this:
xAA\S{4,}xFF
\S{4,} matches at least 4 non whitespace characters.
Demo

(?<=xAA-)[0-9]{4,}(?=\-xFF)
retracts 12345 from xAA-12345-xFF in case there are only digits
(?<=xAA-)[0-9a-zA-Z]{4,}(?=\-xFF) if letters or digits are possible
regex101.com

Related

Regular expression to get everything between two characters/strings

I have been trying to use regular expression to extract data from the following strings
LTE_LTE_FSD9167__P_Airport1
I want to extract the 7 digit sitecode(FSD9167) from the above string.
RUR1251__S_KhooNaiWala
I want to extract 7 digit sitecode(RUR1251) from above string.
For LTE_LTE case I wrote LTE_LTE_([^_;]+).* but it selects the whole string including not the required text only.
The pattern I see is three letters followed by four numbers, so:
\w{3}\d{4}
Use () to capture the pattern:
(\w{3}\d{4})
PHP:
$re = '/(\w{3}\d{4})/m';
JavaScript:
const regex = /(\w{3}\d{4})/gm;
Use https://regex101.com/ to learn the explanation.
You can use something like this:
^(?:LTE_LTE_)?(\S{7})\S*$ /gm
This captures the seven non-whitespace characters either at the beginning (case 2) or just after LTE_LTE_
Demo
You did not provide any rule about how the code could look like. I noticed that both codes you provided in the example have 3 letters followed by 4 digits. I made a rule more generic, with at least 2 letters followed by at least 3 digits.
The regex is:
[a-zA-Z]{2,}\d{3,}
Test here.
As you want to match only these 2 strings, use:
(?<![A-Z0-9])[A-Z0-9]{7}(?![A-Z0-9])
Explanation:
(?<![A-Z0-9]) # negative lookbehind, make sure we haven't alphanum before
[A-Z0-9]{7} # 7 alphanumerics
(?![A-Z0-9]) # negative lookahead, make sure we haven't alphanum after
Demo

Regex for String with first two characters fixed and rest digits

Is there a regular expression for? :
String of length 8
First two chracters fixed 'UE' or 'ue'
remaining 6 characters must be digits [0-9]
Eg: https://regex101.com/r/PufypE/1
The expression i tried
\^(UE|ue){2}[0-9]{6}\
but its not working (no match found!)
You want:
\b(UE|ue)[0-9]{6}\b
You don't need the {2} next to the (UE|ue) since you are specifying those exactly. The \b is a word boundary so this will match a list like you put in the comment: UE123456,ue654321 This is a good site to play with a regex on for this kind of stuff: http://regex101.com
Regex should be:
^[Uu][Ee][0-9]{6}$
(UE|ue){2} in your regex would match 2 occurrences of UE or ue

Reg Exp: match specific number of characters or digits

My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.

Can this Regex be improved?

I have a regex to match a user entered id which has the basic format of [a-zA-z]{2}[\d]{8} but the kicker is a space can be placed between any of the letters or digits in the id so my regex looks like this
[A-Za-z]+[\s]*[A-Za-z]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*
Which is obviously an abomination and should be killed with fire, can this be improved upon?
All of the following are valid inputs
a b 1 2 2 3 4 5 5 6
ab12345678
ab 12345678
Your regex does not comply with your specification, can there be 2 or more letters before the digits? Extactly 8 digits or 8 digist or more?
Try
([a-zA-Z]\s*){2}(\d\s*){8}
If there can only be one space between each character:
([a-zA-Z]\s?){2}(\d\s?){8}
Don't ever use \d and \s unless you know EXACTLY where you are going...
\d will match 09E6 ০ BENGALI DIGIT ZERO (the ০ is your digit :-) ). For example read http://msdn.microsoft.com/en-us/library/w1c0s6bb.aspx
\s will match more types of strange spaces (and the tab character) than you can count, and I'm not kidding. http://msdn.microsoft.com/en-us/library/t809ektx.aspx
Paradoxically using [a-zA-Z] you are limiting quite much your users... No àèéìòù, nor the Turkish ı and İ (the first one is an i without the dot, lower case, the second one is the upper case version of i) http://en.wikipedia.org/wiki/Dotted_and_dotless_I .
Perhaps you could use (\p{L}\p{M}*) (with brackets) instead of [A-Za-z] (all the letters plus the combining marks). You have to add an * or a + AFTER the close bracket. The one expression is for a single letter PLUS its combining marks.
Oh... and you can use one of the other suggestions as a basis for the regex :-)
[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*[\d]+[\s]*
can be replaced with...
\s*(?:\d+\s*){8}
(Also, you can just write \s, rather than [\s], and \d rather than [\d] - the brackets are redundant if you're only specifying a single backslash character class.)
Edit Since there seems to be some confusion about what part of the original regex is being replaced, here's the entire expression after replacement:
[A-Za-z]+\s*[A-Za-z]+\s*(?:\d+\s*){8}
(?:[A-Za-z]+\s*){2}(?:\d+\s*){8}

Limit number of alpha characters in regular expression

I've been struggling to figure out how to best do this regular expression.
Here are my requirements:
Up to 8 characters
Can only be alphanumeric
Can only contain up to three alpha characters [a-z] (zero alpha characters are valid to)
Any ideas would be appreciated.
This is what I've got so far, but it only looks for contiguous letter characters:
^(\d|([A-Za-z])(?!([A-Za-z]{3,}))){0,8}$
I'd write it like this:
^(?=[a-z0-9]{0,8}$)(?:\d*[a-z]){0,3}\d*$
It has two parts:
(?=[a-z0-9]{0,8}$)
Looksahead and matches up to 8 alphanumeric to the end of the string
(?:\d*[a-z]){0,3}\d*$
Essentially allowing injection of up to 3 [a-z] among \d*
Rubular
On rubular.com
12345678 // matches
123456789
#(#*#$
12345 // matches
abc12345
abcd1234
12a34b5c // matches
12ab34cd
123a456 // matches
Alternatives
I do think regex is the best solution for this, but since the string is short, it would be a lot more readable to do this in two steps as follows:
It must match [a-z0-9]{0,8}
Then, delete all \d
The length must now be <= 3
Do you have to do this in exactly one regular expression? It is possible to do that with standard regular expressions, but the regular expression will be rather long and complicated. You can do better with some of the Perl extensions, but depending on what language you're using, they may or may not be supported. The cleanest solution is probably to check whether the string matches:
^[A-Za-z0-9]{0,8}$
but doesn't match:
([A-Za-z].*){4}
i.e. it's an alpha string of up to 8 characters (first regular expression), but doesn't contain 4 or more alpha characters (possibly separated by other characters (second regular expression).
/^(?!(?:\d*[a-z]){4})[a-z0-9]{0,8}$/i
Explanation:
[a-z0-9]{0,8} matches up to 8 alphanumerics.
Lookahead should be placed before the matching happens.
The (?:\d*[a-z]) matches 1 alphabetic anywhere. The {4} make the count to 4. So this disables the regex from matching when 4 alphabetics can be found (i.e. limit the count to ≤3).
It's better not to exploit regex like this. Suppose you use this solution, are you sure you will know what the code is doing when you revisit it 1 year later? A clearer way is just check rule-by-rule, e.g.
if len(theText) <= 8 and theText.isalnum():
if sum(1 for c in theText if c.isalpha()) <= 3:
# valid
The easiest way to do this would be in multiple steps:
Test the string against /^[a-z0-9]{0,8}$/i -- the string is up to 8 characters and only alphanumeric
Make a copy of the string, delete all non-alphabetic characters
See if the resulting string has a length of 3 or less.
If you want to do it in one regular expression, you can use something like:
/^(?=\d*(?:[a-z]?\d*){0,3}$)[a-z0-9]{0,8}$/i
Which looks for a alphanumeric string between length 0 and 8 (^[a-z0-9]{0,8}$), but first uses a lookahead ((?=\d*(?:[a-z]?\d*){0,3}$)) to make sure that the string
has at most 3 alphabetic characters.