Regex - matching while ignoring some characters - regex

I am trying to write a regex to max a sequence of numbers that is 5 digits long or over, but I ignore any spaces, dashes, parens, or hashes when doing that analysis. Here's what I have so far.
(\d|\(|\)|\s|#|-){5,}
The problem with this is that this will match any sequence of 5 characters including those characters I want to ignore, so something like "#123 " would match. While I do want to ignore the # and space character, I still need the number itself to be 5 digits or more in order to qualify at a match.
To be clear, these would match:
1-2-3-4-5
123 45
2(134) 5
Bonus points if the matching begins and ends with a number rather than with one of those "special characters" I am excluding.
Any tips for doing this kind of matching?

If I understood requirements right you can use:
^\d(?:[()\s#-]*\d){4,}$
RegEx Demo
It always matches a digit at start. Then it is followed by 4 or more of a non-capturing group i.e. (?:[()\s#-]*\d) which means 0 or more of any listed special character followed by a digit.

So just repeat a digit, followed by any other sequence of allowed characters 5 or more times:
^(\d[()\s#-]*){5,}$
You can ensure it ends on a digit if you subtract one of the repetitions and add an explicit digit at the end:
^(\d[()\s#-]*){4,}\d$

You can suggest non-digits with \D so et would be something like:
(\d\D*){5,}
Here is a guide.

Related

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.
In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.
I understand the logic of
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.
I thought this might work:
^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$
which I interpret as:
^$: The entire expression must only contain what's between these 2 symbols
[0-9]*: Any digits between 0-9, 0 or more times followed by:
(?=\d{5}0*1*2*3*4*5*6*7*8*9*): A part where at least 5 increasing digits are found followed by:
[0-9]*: Any digits between 0-9, 0 or more times.
However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:
00001459000
12345
This shouldn't match:
abc12345
9871234444
While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.
That said, here's how I'd do it if I had to:
^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$
Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".
We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).
Live demo: https://regex101.com/r/03rJET/3
(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)
Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).
First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:
^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+
Separated out for better readability:
^(?=\d+$)\d*?
(?:
0(?=[1-9])|
1(?=[2-9])|
2(?=[3-9])|
3(?=[4-9])|
4(?=[5-9])|
5(?=[6-9])|
6(?=[7-9])|
7(?=[89])|
8(?=9)
){4}
\d+
The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)
https://regex101.com/r/03rJET/2
It's ugly, but it works.

Why is my regular Expression that ignore the order of the characters does not work?

I want to make a string pattern that is:
at least 7 characters long
have at least 1 digits, max 5
have at least 3 capital alphabetic characters , max 5
have at least 1 lower alphabetic characters , max 5
have at least 1 special characters , max 5
How to express this in a regular expression?
I can do something like
^((?=.*[A-Z]{3,5})(?=.*[a-z]{1,5})(?=.*[0-9]{1,5})(?=.*[.~!##$%^_&-]{1,5}))(?=.{7,20}).*$
I don't want to require this kind of order. In fact, any mixed order should be accepted, only require the number of characters.
This Match:
PASSW120P45ccb^&#%#
But this one does not
PA12S1SW2045ccb^&#%#
How can i fix this?
P&#Ass120W45ccb^%#
P&#Ass20W45cb^%#
Please have a look at https://regex101.com/r/vF2yO7/51
You need to operate with the contrary character classes, put these into non-capturing groups and repeat these:
^
(?=(?:\D*\d){1,5})
(?=(?:[^A-Z]*[A-Z]){3,5})
(?=(?:[^a-z]*[a-z]){1,5})
(?=(?:[^.\~!##$%^_&-]*[.\~!##$%^_&-]){1,5})
.{7,20}
$
See a demo on regex101.com.
The structure here is always the same, e.g. with the numbers: require anything not a number zero or more times, followed by a number and repeat the whole pattern 1-5 times. In general:
(?=(?:not_what_you_want*what_you_want){min_times, max_times})
In the expression above, all pos. lookaheads follow this scheme, [^...] negates the characters to be matched in the class and \D* is essentially the same as [^\d]*.

regex few blocks exact length

I want to match 5 to 20 character with regex.
I try to use below regular expression for my checking.
/^[a-zA-Z][\w]{5,20}$/
It's work, but the problem of length it match 6 to 21 character.
(^[a-zA-Z][\w]){4,20}$
I also try this but it don't work.
Please anyone help me to match exact length of regex.
It's because your capturing group is expecting TWO characters:
[a-zA-Z] and [\w], that's two letters.
So your first attempt actually did this:
match [a-zA-Z] once
match [\w] once
match the previous matches 5 - 20 times
Inevitably, you always had 1 more match than expected
Capture only one character, and iterate it 5-20 times.
Have you tried:
^([a-zA-Z]{5,20})$ ?
OR
^(\w{5,20})$ ?
You're almost there, you just need to make a single range of characters (in square brackets) not two.
/^[a-zA-Z][\w]{5,20}$/ means:
a character from a to z in lower or upper case
5 to 20 word characters
That sums up to 6 to 21 characters in total.
I suppose you want /^[a-zA-Z][\w]{4,19}$/:
a character from a to z in lower or upper case
4 to 19 word characters
That sums up to 5 to 20 characters in total.
The Quantifier is only applied to the [\w]. So this expects exactly one letter character and then 5-20 whitespace characters.
I assume you want 5-20 characters that can be either a letter a-z or a whitespace. You need to group these together in square brackets and then apply the quantifier:
^[a-zA-Z\W]{5,20}$
So, I understand, you want a string that has 5-20 characters, starts with a letter and then only has letters and digits. You would write it like that:
^[a-zA-Z][a-zA-Z0-9]{4,19}$
This expects first a letter and then 4-19 letters or digits.
BTW: https://regex101.com/ is a great site to test regular expressions and get an explanation what they are doing.

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.
\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.
\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890
\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

Regex to check for 4 consecutive numbers

Can I use
\d\d\d\d[^\d]
to check for four consecutive numbers?
For example,
411112 OK
455553 OK
1200003 OK
f44443 OK
g55553 OK
3333 OK
f4442 No
45553 No
f4444g4444 No
f44444444 No
If you want to find any series of 4 digits in a string /\d\d\d\d/ or /\d{4}/ will do. If you want to find a series of exactly 4 digits, use /[^\d]\d{4}[^\d]/. If the string should simply contain 4 consecutive digits use /^\d{4}$/.
Edit: I think you want to find 4 of the same digits, you need a backreference for that. /(\d)\1{3}/ is probably what you're looking for.
Edit 2: /(^|(.)(?!\2))(\d)\3{3}(?!\3)/ will only match strings with exactly 4 of the same consecutive digits.
The first group matches the start of the string or any character. Then there's a negative look-ahead that uses the first group to ensure that the following characters don't match the first character, if any. The third group matches any digit, which is then repeated 3 times with a backreference to group 3. Finally there's a look-ahead that ensures that the following character doesn't match the series of consecutive digits.
This sort of stuff is difficult to do in javascript because you don't have things like forward references and look-behind.
Should the numbers be part of a string, or do you want only the four numbers. In the later case, the regexp should be ^\d{4}$. The ^ marks the beginning of the string, $ the end. That makes sure, that only four numbers are valid, and nothing before or after that.
That should match four digits (\d\d\d\d) followed by a non digit character ([^\d]). If you just want to match any four digits, you should used \d\d\d\d or \d{4}. If you want to make sure that the string contains just four consecutive digits, use ^\d{4}$. The ^ will instruct the regex engine to start matching at the beginning of the string while the $ will instruct the regex engine to stop matching at the end of the string.