limit expression length - regex

I am using the following in a script of mine to verify minutes entered... it allows for numbers and a comma for thousands in the correct format only... however, I would like to add a length restriction as well... I can't seem to do it or I'm just putting itin the wrong spot... here is the code as is with no limit :
(!preg_match("#^(\d{1,3}(\,\d{3})*|(\d+))$#",$values['minutes']))
I would like to make this at least one with a max of five... the entry is for minutes online per day... well there are only 1440 minutes in a day... if you entered 1,440 which is valid currently that is 5 characters and I want to limit the expression to that...
Anyone?

Two suggestions:
preg_match("#^(?:\d{1,3}|1,?\d{3})$#"
Explanation:
^ # Start of string
(?: # Either match...
\d{1,3} # a three-digit number
| # or
1 # a four digit number that starts with a 1
,? # and may have a thousands separator
\d{3} # (and three more digits)
)
$ # End of string
The problem is of course that this also allows 1,999, so you'd still need an extra sanity check. This probably is the better solution.
You can also do the range limitation in the regex itself, but that's cumbersome:
preg_match("#^(?:1,?440|1,?4[0-3]\d|1,?[0-3]\d{2}|[1-9]\d{1,2}|\d)$#"
Explanation:
^ # Start of string
(?: # Either match...
1,?440 # 1440
| # or
1,?4[0-3]\d # 1400-1439
| # or
1,?[0-3]\d{2} # 1000-1399
| # or
[1-9]\d{1,2} # 10-999
| # or
\d # 0-9
)
$ # End of string

You're probably better off just testing the string's length or even the integer value. But just to show that it's possible:
preg_match("#^(\d,\d{3}|\d{1,4})$#")
Yes, it's very simple, since a four-digit number can only take one of the forms
one digit, comma, three digits
four digits

Related

regex to check if string doesn't contain non consecutive numbers on partial string

I am trying to create regex for below case:
Input string consisting of all numbers, max length is 30.
Check if in first 10 digits, any number is not consecutively appearing equal or more than 3 in length
eg.
1234567 --> is good (no consecutive number)
1234456 --> is good (4 appears consecutive but length is less than 3)
1234445 --> is bad (4 appears consecutive and length is equal or greater than 3)
12345678904444 --> is good (4 appears consecutive and length is greater than 3 however it is accepted since it is appearing after cut off of 10 digit)
The regex I came up with is below. pardon me for my mistake if any in regex, i am still in learning mode with regexes:
https://regex101.com/r/rv5e6a/1
currently it is getting applied all across the string but not sure how to limit so that regex can be applied only for first 10 digits only.
You can use
^(?!\d{0,7}(\d)\1{2})\d{1,30}$
See the regex demo. Note that \d{0,7} in the lookahead will allow checking for repeated digits only within the first ten. More details:
^ - start of string
(?!\d{0,7}(\d)\1{2}) - a negaitve lookahead that fails the match if there are three same digits after zero to seven digits immediately to the right of the current location
\d{1,30} - one to thirty digits
$ - end of string.
^(?:(\d)(?!\1{2})){1,9}$|^(?:(\d)(?!\2{2})){10}\d*$
regex101 link
Explanation:
^ # beginning of the line
(?: # start of a non-capturing group
(\d) # a single digit in a group that we can refer to with \1 later on
(?!\1{2}) # not followed by the digit in the \1 group repeated twice
){1,9} # repeat the non-capturing group 1-9 times
$ # end of the line
| # OR
^ # beginning of the line
(?: # start of a second non-capturing group
(\d) # a single digit in a group that we can refer to with \2 later on
(?!\2{2}) # not followed by the digit in the \2 group repeated twice
){10} # repeat the non-capturing group 10 times
\d* # the rest of the string can be more digits
$ # end of the line
The important parts of the regex above makes sure that a given digit is not followed by the same digit two more times ^(?:(\d)(?!\1{2}). But, because we only care about the first 10 digits, we need to handle this in two cases.
in the first case, we have a string of digits that is less than 10 characters, so we want our pattern to repeat 1-9 times and then hit the end of the line.
in the second case, we have a string of digits that is 10 or more characters and then there might be even more characters after that that we don't care about.
We need to keep these two cases separate because we don't want to exclude the cases where there are fewer than 10 characters total in the string.

Regex expression Giving all letters

I need all groups of 4 capital letters in a string.
So I am using REGEXP_REPLACE([Description],'\b(?![A-Z]{4}\b)\w+\b',' ')
in Tableau to replace all small letters and extra characters. I want to get only instances of capital letters with 4 string length.
By google I got to know i cannot use Regex_extract (Since /g is not supported)
My String:
"The following trials have no study data-available, in the RBM mart. It appears as is this because they were . In y HIWEThe trials currently missing data are:
JADA, JPBD, JVCS, JADQ, JVDI, JVDO, JVTZ"
I have written [^A-Z]{4}/g.
I want:
HIWE JADA JPBD JVCS JADQ JVDI JVDO JVTZ
But this is also giving me single capital letter and space included.
Thanks
You can use this regex:
((?<=[A-Z]{4})|^).*?(?=[A-Z]{4}|$)
Explaining:
( # one of:
^ # the starting position
| # or
(?<=[A-Z]{4}) # any position after four upper letters
) #
.*? # match anything till the first:
(?= # position which in front
[A-Z]{4} # has four upper letters
| # or
$ # is the string's end
) #
Any doubt feel free to ask :)

How does this regex for FQDNs (excluding.arpa) work?

I am trying to understand how regex works. I understand it little by little. However, I don't understand this one completely. It's basically a regex for fully qualified domain names but a requirement is that the ending can't be .arpa.
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
https://regex101.com/r/hU6tP0/3
This doesn't match google.uk. If I change it to:
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{1,63}[^.arpa]$)
It works again.
But this works as well
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
Here is my thought process for
?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
I see it as this
(?=
Is a positive look ahead (Can someone explain to me what this actually means?) As I understand it now, it just means that the string needs to match the regex.
^.{4,253}$)
Match all characters but it needs to be between 4 and 253 characters long.
(^([a-zA-Z0-9]{1,63}\.)
Start a capture group and make another capture group within. This capture group says that every non special character can be written 1 to 63 times or till the . is written.
+
The previous capture group can be repeated indefinitely, but it should always end with a .. This way the next capture group is started.
[a-zA-Z]{2,63}
Then as many times as you want you can write a to z with upper, but it needs to be between 2 and 63.
[^.arpa]$)
The last characters can't be .arpa.
Can someone tell me where I am going wrong?
This doesn't do what you think it does:
[^.arpa]
All that says is 'ends with something that isn't one of the letter apr.' - it's a negated character class.
You might be thinking of a negative lookahead assertion:
(?!\.arpa)$
But if you're trying to compound multiple criteria in a regex, I'd suggest you're probably using the wrong tool for the job. It ends up complicated and hard to debug, thanks to greedy/non-greedy matching, etc.
Your 'positive/negative' lookaheads are to match a piece of a pattern that aren't surrounded by other pieces of pattern. But that can have some unexpected outcomes if you're matching variable widths, because the regex engine will backtrack until it finds something that matches.
A simpler example:
([\w.]+)(?!arpa)$
Applied to:
www.test.arpa
Will it match? What's in the group?
... it will match, because [\w\.]+ will consume all of it, and then the lookahead won't "see" anything.
If you use:
([\w]+)\.(?!arpa)
Instead though - you'll capture.... www, but you won't match test (with e.g. g flag, because the www doesn't have .arpa after it, but the test does.
https://regex101.com/r/hU6tP0/5
It really does get complicated using negative assertions in a pattern as a result. I'd suggest simply not doing so, and applying two separate tests. It's hard for you to figure out, and it's hard for a future maintenance programmer too!
This is an analysis of your regex:
(?=^.{4,253}$) # force min length: 4 chars, max length: 253 chars
( # Capturing Group 1 (CG1) - not needed
^ # Match start of the string
( # CG2 (can be a non capturing group '(?:...)')
[a-zA-Z0-9]{1,63} # any sequence of letters and numbers with length between 1 and 63
\. # a literal dot
)+ # CLOSE CG2
[a-zA-Z]{1,63} # any letter sequence with length between 1 to 63
[^.arpa] # a negated char class: any char that is not a "literal" '.','a','r','p' (last 'a' is redundant)
$ # end of the string
) # CLOSE CG1
To avoid the tail of the string to be .arpa you need to use a negative lookahead (?!...), so modify just like this:
(?=^.{4,253}$)(?!.*\.arpa$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
An online demo
Update:
I've upgraded the regex to rationalise it (i've incorporated also the Sobrique suggestion adding an important details):
/^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i
Compact version online demo
Legenda
/ # js regex delimiter
^ # start of the string
(?=.{4,253}$) # force min length: 4 chars, max length: 253 chars
(?: # Non capturing group 1 (NCG1)
[a-z0-9]{1,63} # any letter or digit in a sequence with length from 1 to 63 chars
[.] # a literal dot '.' (more readable than \.)
)+ # CLOSE NCG1 - repeat its content one or more time
(?!arpa$) # force that after the last literal dot '.' the string does not end with 'arpa' (i've added '$' to Sobrique suggestion instead it prevents also '.arpanet' too)
[a-z]{2,63} # a sequence of letters with length from 2 to 63
$ # end of the string
/i # Close the regex delimiter and add case insensitive flag [a-z] match also [A-Z] and viceversa
var re = /^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i;
var tests = ['google.uk','domain.arpa','domain.arpa2','another.domain.arpa.net','domain.arpanet'];
var m;
while(t = tests.pop()) {
document.getElementById("r").innerHTML += '"' + t + '"<br/>';
document.getElementById("r").innerHTML += 'Valid domain? ' + ( (t.match(re)) ? '<font color="green">YES</font>' : '<font color="red">NO</font>') + '<br/><br/>';
}
<div id="r"/>

Regex Social Security number validation with dummy characters

I am modifying existing code that displays a SS#. I am trying to figure out the existing validation although I know next to nothing about regular expressions. What I need to do is refactor the existing validation to ALSO accept dummy characters (probably upper-case "X") for the first 5 places, displaying only the last 4 effectively. All this w/o messing up the existing validation. What I pass into the control will depend on roles within the application, either the full number, 000000000 or XXXXX0000. Any suggestions would be greatly appreciated.
<dx:ASPxTextBox ID="SSN" runat="server" CssClass="ContractTextEntry"
MaxLength="9" Width="145px" AutoPostBack="True"
ValidationSettings-RegularExpression-ValidationExpression="^(?!000)(?!666)(?!9)\d{3}([- ]?)(?!00)\d{2}\1(?!0000)\d{4}$">
<MaskSettings Mask="000-00-0000" PromptChar=" " />
<ValidationSettings SetFocusOnError="True">
<RegularExpression ErrorText="Please enter a valid SSN" />
</ValidationSettings>
</dx:ASPxTextBox>
If you just want to accept X as well as a digit in your first 5 numerals then its a fairly straightforward modification:
^(?!000)(?!666)(?!9)[X0-9]{3}([- ]?)(?!00)[X0-9]{2}\1(?!0000)\d{4}$
all I've done is replace a couple of instances of \d (meaning any digit) with [X0-9] (meaning X or a character in the range 0-9)
FYI - the {3} following the first means repeated 3 times (and the {2} on the 2nd instance means repeated 2 times)
Since you require a few things, either all the first 5 are X's or they're all digits.
I think Dot-Net supports conditionals, but not sure if group number match.
I know it supports group name conditional.
# ^(?!000)(?!666)(?!9)(?:(XXX)|\d{3})([- ]?)(?!00)(?(1)XX|\d{2})\2(?!0000)\d{4}$
^
(?! 000 )
(?! 666 )
(?! 9 )
(?:
( XXX ) # (1), XXX
| \d{3} # Or digits
)
( [- ]? ) # (2), Separator
(?! 00 )
(?(1) # Conditional, did group 1 match ?
XX # yes, get XX
| \d{2} # no, get digits
)
\2 # Backref to separator
(?! 0000 )
\d{4}
$

Simple regex validation

I want to implement the following validation. Match at least 5 digits and also some other characters between(for example letters and slashes). For example 12345, 1A/2345, B22226, 21113C are all valid combinations. But 1234, AA1234 are not. I know that {5,} gives minimum number of occurrences, but I don't know how to cope with the other characters. I mean [0-9A-Z/]{5,} won't work:(. I just don't know where to put the other characters in the regex expression.
Thanks in advance!
Best regards,
Petar
Using the simplest regex features since you haven't specified which engine you're using, you can try:
.*([0-9].*){5}
|/|\ /|/| |
| | \ / | | +--> exactly five occurrences of the group
| | | | +----> end group
| | | +------> zero or more of any character
| | +---------> any digit
| +------------> begin group
+--------------> zero or more of any character
This gives you any number (including zero) of characters, followed by a group consisting of a single digit and any number of characters again. That group is repeated exactly five times.
That'll match any string with five or more digits in it, along with anything else.
If you want to limit what the other characters can be, use something other than .. For example, alphas only would be:
[A-Za-z]*([0-9][A-Za-z]*){5}
EDIT: I'm picking up your suggestion from a comment to paxdiablo's answer: This regex now implements an upper bound of five for the number of "other" characters:
^(?=(?:[A-Z/]*\d){5})(?!(?:\d*[A-Z/]){6})[\dA-Z/]*$
will match and return a string that has at least five digits and zero or more of the "other" allowed characters A-Z or /. No other characters are allowed.
Explanation:
^ # Start of string
(?= # Assert that it's possible to match the following:
(?: # Match this group:
[A-Z/]* # zero or more non-digits, but allowed characters
\d # exactly one digit
){5} # five times
) # End of lookahead assertion.
(?! # Now assert that it's impossible to match the following:
(?: # Match this group:
\d* # zero or more digits
[A-Z/] # exactly one "other" character
){6} # six times (change this number to "upper bound + 1")
) # End of assertion.
[\dA-Z/]* # Now match the actual string, allowing only these characters.
$ # Anchor the match at the end of the string.
You may want to try counting the digits instead. I feel its much cleaner than writing a complex regex.
>> "ABC12345".gsub(/[^0-9]/,"").size >= 5
=> true
the above says substitute all things not numbers, and then finding the length of those remaining. You can do the same thing using your own choice of language. The most fundamental way would be to iterate the string you have, counting each character which is a digit until it reaches 5 (or not) and doing stuff accordingly.