Match specific length x or y

Match specific length x or y - regex

I'd like a regex that is either X or Y characters long. For example, match a string that is either 8 or 11 characters long. I have currently implemented this like so: ^([0-9]{8}|[0-9]{11})$.
I could also implement it as: ^[0-9]{8}([0-9]{3})?$
My question is: Can I have this regex without duplicating the [0-9] part (which is more complex than this simple \d example)?

There is one way:
^(?=[0-9]*$)(?:.{8}|.{11})$
or alternatively, if you want to do the length check first,
^(?=(?:.{8}|.{11})$)[0-9]*$
That way, you have the complicated part only once and a generic . for the length check.
Explanation:
^ # Start of string
(?= # Assert that the following regex can be matched here:
[0-9]* # any number of digits (and nothing but digits)
$ # until end of string
) # (End of lookahead)
(?: # Match either
.{8} # 8 characters
| # or
.{11} # 11 characters
) # (End of alternation)
$ # End of string

With Perl, you could do:
my $re = qr/here_is_your_regex_part/;
my $full_regex = qr/$re{8}(?:$re{3})?$/

For those of us looking to capture different lengths of the same multiple try this.
^(?:[0-9]{32})+$
Where 32 is the multiple you want to capture all lengths for (32, 64, 96, ...).

Related

How does this regex for FQDNs (excluding.arpa) work?

I am trying to understand how regex works. I understand it little by little. However, I don't understand this one completely. It's basically a regex for fully qualified domain names but a requirement is that the ending can't be .arpa.
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
https://regex101.com/r/hU6tP0/3
This doesn't match google.uk. If I change it to:
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{1,63}[^.arpa]$)
It works again.
But this works as well
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
Here is my thought process for
?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
I see it as this
(?=
Is a positive look ahead (Can someone explain to me what this actually means?) As I understand it now, it just means that the string needs to match the regex.
^.{4,253}$)
Match all characters but it needs to be between 4 and 253 characters long.
(^([a-zA-Z0-9]{1,63}\.)
Start a capture group and make another capture group within. This capture group says that every non special character can be written 1 to 63 times or till the . is written.
+
The previous capture group can be repeated indefinitely, but it should always end with a .. This way the next capture group is started.
[a-zA-Z]{2,63}
Then as many times as you want you can write a to z with upper, but it needs to be between 2 and 63.
[^.arpa]$)
The last characters can't be .arpa.
Can someone tell me where I am going wrong?

This doesn't do what you think it does:
[^.arpa]
All that says is 'ends with something that isn't one of the letter apr.' - it's a negated character class.
You might be thinking of a negative lookahead assertion:
(?!\.arpa)$
But if you're trying to compound multiple criteria in a regex, I'd suggest you're probably using the wrong tool for the job. It ends up complicated and hard to debug, thanks to greedy/non-greedy matching, etc.
Your 'positive/negative' lookaheads are to match a piece of a pattern that aren't surrounded by other pieces of pattern. But that can have some unexpected outcomes if you're matching variable widths, because the regex engine will backtrack until it finds something that matches.
A simpler example:
([\w.]+)(?!arpa)$
Applied to:
www.test.arpa
Will it match? What's in the group?
... it will match, because [\w\.]+ will consume all of it, and then the lookahead won't "see" anything.
If you use:
([\w]+)\.(?!arpa)
Instead though - you'll capture.... www, but you won't match test (with e.g. g flag, because the www doesn't have .arpa after it, but the test does.
https://regex101.com/r/hU6tP0/5
It really does get complicated using negative assertions in a pattern as a result. I'd suggest simply not doing so, and applying two separate tests. It's hard for you to figure out, and it's hard for a future maintenance programmer too!

This is an analysis of your regex:
(?=^.{4,253}$) # force min length: 4 chars, max length: 253 chars
( # Capturing Group 1 (CG1) - not needed
^ # Match start of the string
( # CG2 (can be a non capturing group '(?:...)')
[a-zA-Z0-9]{1,63} # any sequence of letters and numbers with length between 1 and 63
\. # a literal dot
)+ # CLOSE CG2
[a-zA-Z]{1,63} # any letter sequence with length between 1 to 63
[^.arpa] # a negated char class: any char that is not a "literal" '.','a','r','p' (last 'a' is redundant)
$ # end of the string
) # CLOSE CG1
To avoid the tail of the string to be .arpa you need to use a negative lookahead (?!...), so modify just like this:
(?=^.{4,253}$)(?!.*\.arpa$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
An online demo
Update:
I've upgraded the regex to rationalise it (i've incorporated also the Sobrique suggestion adding an important details):
/^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i
Compact version online demo
Legenda
/ # js regex delimiter
^ # start of the string
(?=.{4,253}$) # force min length: 4 chars, max length: 253 chars
(?: # Non capturing group 1 (NCG1)
[a-z0-9]{1,63} # any letter or digit in a sequence with length from 1 to 63 chars
[.] # a literal dot '.' (more readable than \.)
)+ # CLOSE NCG1 - repeat its content one or more time
(?!arpa$) # force that after the last literal dot '.' the string does not end with 'arpa' (i've added '$' to Sobrique suggestion instead it prevents also '.arpanet' too)
[a-z]{2,63} # a sequence of letters with length from 2 to 63
$ # end of the string
/i # Close the regex delimiter and add case insensitive flag [a-z] match also [A-Z] and viceversa
var re = /^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i;
var tests = ['google.uk','domain.arpa','domain.arpa2','another.domain.arpa.net','domain.arpanet'];
var m;
while(t = tests.pop()) {
document.getElementById("r").innerHTML += '"' + t + '"<br/>';
document.getElementById("r").innerHTML += 'Valid domain? ' + ( (t.match(re)) ? '<font color="green">YES</font>' : '<font color="red">NO</font>') + '<br/><br/>';
}
<div id="r"/>

How do I insert a regex quantifier into a range

I have this regex range
/([a-zA-Z0-9\.\-#_~!$&'()*+,;=:]+)/
for the allowable characters for a user's username. I would like to add the # char to the range but limit its quantity to 0 or 1 and have it's location be irrelevant.
I know #{0,1} is the quantifier syntax, but how do I combine it with my range to meet my specifications.
Requirements:
A - Z alphabet
0 - 9 numerals
Only 1 # allowed
Special Characters: # - _ ~ ! $ & ' ( ) * + , ; = :
Thanks

You can use a lookahead regex like this:
/^(?!(?:[^#]*#){2})[-a-zA-Z0-9.#_~!$&'()*+,;=:#]+$/
RegEx Demo
(?!(?:[^#]*#){2}) will disallow 2 # in your input thus allowing you to use 0 or 1 # in input. Also check demo.

i would like to make you Regex small by using \w in it
here is what w indicates in regex
\w : Matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_]
please look ahead for this,this will reduce length of your regex.
^(?!(?:[^#\n]*#){2})[\w.#_~!$&'()*+,;=:#]+$

Just copy your regex and add the # in the middle:
/^([-a-zA-Z0-9.#_~!$&'*+,;=:()]+#?[-a-zA-Z0-9.#_~!$&'*+,;=:()]+)$/
or more precisely:
/^(#|#?[-a-zA-Z0-9.#_~!$&'*+,;=:()]+|[-a-zA-Z0-9.#_~!$&'*+,;=:()]+#|[-a-zA-Z0-9.#_~!$&'*+,;=:()]+#[-a-zA-Z0-9.#_~!$&'*+,;=:()]+)$/

Regex that only matches on odd/even indices

Is there a regex that matches a string only when it starts on an odd or an even index? My use case is a hex string in which I want to replace certain "bytes".
Now, when trying to match 20 (space), 20 in "7209" would be matched as well even though it consists of the bytes 72 and 09. I am restricted to the regex implementation of Notepad++ in this case, so I'm not able to check the match index as e.g. in Java.
My sample input looks like:
324F8D8A20561205231920
I set up a testing page here, the regex should only match the first and the last occurence of 20, since the one in the middle starts on an odd index.

You can use the following regex to match 20 at even positions inside a hex string:
20(?=(?:[\da-fA-F]{2})*$)
See demo
I assume the string has no spaces in this case.
In case you have spaces between the values (or any other symbols), this could be an alternative (with $1XX-like replacement string):
((?:.{2})*?)20
See another demo

This seems to work for evens:
rx <- "^(.{2})*(20)"
strings <- c("7209","2079","9720")
grepl(rx,strings) # [1] FALSE TRUE TRUE

Not sure what Notepad++ uses for regex engine - it's been a while since I used it. This works in javascript...
/^(?:..)*?(20)/
...
/^ # start regex
(?: # non capturing group
.. # any character (two times)
)*? # close group, and repeat zero or more times, un-greedily
(20) # capture `20` in group
/ # end regex

How to write a RegEx pattern that accepts a string with at most one of each letter, but unordered?

I have tried this:
[a]?[b]?[c]?[d]?[e]?[f]?[g]?[h]?[i]?[j]?[k]?[l]?[m]?[n]?[o]?[p]?[q]?[r]?[s]?[t]?[u]?[v]?[w]?[x]?[y]?[z]?
But this RegEx rejects string where the order in not alphabetical, like these:
"zabc"
"azb"
I want patterns like these two to be accepted too. How could I do that?
EDIT 1
I don't want letter repetitions, i.e., I want the following strings to be rejected:
aazb
ozob
Thanks.

You can use a negative lookahead assertion to make sure no two characters are the same:
^(?!.*(.).*\1)[a-z]*$
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match the following:
.* # any number of characters
(.) # followed by one character (capture that in group 1)
.* # followed by any number of characters
\1 # followed by the same character as the one captured before
) # End of lookahead
[a-z]* # Match any number of ASCII lowercase letters
$ # End of string
Test it live on regex101.com.
Note: This regex needs to brute-force check all possible character pairs, so performance may be a problem with larger strings. If you can use anything besides regex, you're going to be happier. For example, in Python:
if re.search("^[a-z]*$", mystring) and len(mystring) == len(set(mystring)):
# valid string

Match letter followed by specific numeric range

I am writing a regular expression in which the string can be of 2-3 characters.
The first character has to be a Alphabet between A and H (capitals). This character has to be followed by a number between 1 and 12.
I wrote
[A-H]{1}[1-12]{1,2}
This is fine when I keyin A12 but not when I keyin A6
Please suggest.

You can't specify a range of digits like that because it is implemented as a range between characters, so [1-12] is equivalent to [12], which would only match either a 1 or a 2. Instead, try the following:
[A-H](?:1[012]|[1-9])
Here is an explanation:
[A-H] # one letter from A to H
(?: # start non-capturing group
1[012] # 1 followed by 0, 1, or 2 (10, 11, 12)
| # OR
[1-9] # one digit from 1 to 9
) # end non-capturing group
Note that the {1} after [A-H] in your original regex is unnecessary, [A-H]{1} and [A-H] are equivalent.
You may want to consider adding anchors to the regex, otherwise you would also get a partial match on a string like A20. If you are trying to match an entire string then you should use the following:
\A[A-H](?:1[012]|[1-9])\z
If it is within a larger text you could use word boundaries instead:
\b[A-H](?:1[012]|[1-9])\b

Here you go:
^[A-H]([1-9]|1[0-2])$
No need to for the {1} in your question.
The regex is anchored with ^ and $ meaning it can can be the only thing on your line.
It will not match A60 for example

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match specific length x or y - regex

With Perl, you could do: my $re = qr/here_is_your_regex_part/; my $full_regex = qr/$re{8}(?:$re{3})?$/

For those of us looking to capture different lengths of the same multiple try this. ^(?:[0-9]{32})+$ Where 32 is the multiple you want to capture all lengths for (32, 64, 96, ...).

Related

How does this regex for FQDNs (excluding.arpa) work?

How do I insert a regex quantifier into a range

Regex that only matches on odd/even indices

How to write a RegEx pattern that accepts a string with at most one of each letter, but unordered?

Match letter followed by specific numeric range

Categories

Resources