Regular Expressions (Regex) 2 expressions: remove spaces before verifying 2nd one - regex

TLDR:
I have option to add only one regex.
How to make those 2 expressions:
\s
(\d{10})(19|20)(\d{2})$:$1$3
work at the same time (one after another) and not separately?
This is not enough: \s|(\d{10})(19|20)(\d{2})$:$1$3
Long description:
I have an expression: '(\d{10})(19|20)(\d{2})$:$1$3'
What it does:
user password should have 12 digits - ending with last 2 digits of the year
in case phrase has 14 digits (someone added full year) - ignore digits 11th and 12th
Thanks to that we can accept both codes: 308814310175 and 30881431011975.
Now I'm looking for a way to ignore spaces in case user adds them anywhere by mistake (not my requirement).
Theoretically I can just add '|\s', to get '\s|(\d{10})(19|20)(\d{2})$:$1$3'.
Both regex works separately:
when someone adds full year - it removes 11th and 12th digits
when someone adds space - it removes it
but if someone adds space AND adds full year then only removing of spaces works (because phrase is longer than 14 digits).
So this works:
308814310175
30881431011975
3088143 10119
But this is not working:
3088143101 1975
because it removes space OR 11th/12th digits - not making both things work one after another.
How to make both expressions work at the same time?
Thank you in advance for your help.

A somewhat long solution would be to capture any digit seperately and avoid spaces and a possible 11th and 12 digit in case of 14 digits total:
^\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(\d)\s*(?:1\s*9|2\s*0)?\s*(\d)\s*(\d)\s*$
See an online demo. You would then replace this with $1$2$3$4$5$5$6$7$8$9$10$11$12
Another possibility (if supported) could be to replace:
(?:[^\S\n]|(?<=^\s*(?:\d\s*?){10})\s*(?:1\s*9|2\s*0)(?=\s*\d\s*\d\s*$))
With nothing. But this would require zero-width lookbehind. See demo

You are trying to solve a simple problem in a complicated way. Instead of using a complicated regex, just use two simple steps:
Remove unwanted spaces.
Apply the regex to validate the string and remove other unwanted characters.

Related

Regex to check if string has at least 2 numbers and 1 capital letter

I need to create a regular expression to check if a password has at least 1 uppercase letter, at least 2 numbers, and ends with a $ (dollar sign).
I've been trying to figure it out, but I can only get as far as checking if there's at least 1 uppercase and one number, rather than two.
These should be valid:
4hg5Fjkjk$
fh##Y5fFF5$
hgH5Hu6$
These should not be valid:
45tyghisu$ (No capital)
5THygfhy$ (Only one number)
Gh%hF45$h (No dollar sign at the end)
Here's what I have so far (checks for at least one number, one capital and dollar sign at the end)
/(?=.*[A-Z])(?=.*\d).*\$/
Any help would be greatly appreciated!
ps. I've looked on SO, and can't find anything relating to more than one required character.
In your pattern you have to repeat asserting a digit twice instead of one time using for example (?=(?:[^\d\r\n]*\d){2}) using contrast.
If you don't want to allow spaces in the password, you could use \S+ to match 1+ times a non whitespace char.
You could use:
^(?=[^A-Z\r\n]*[A-Z])(?=(?:[^\d\r\n]*\d){2})\S+\$$
Regex demo
According to the given answer by the OP, the number of characters should be 9-15:
^(?=[^A-Z\r\n]*[A-Z])(?=(?:[^\d\r\n]*\d){2})\S{9,15}\$$
Regex demo
This RegEx is simple and makes no other assumptions about what characters may be in a password other than what was specified by the OP.
^(?=.*?[A-Z])(?=.*?\d.*?\d).*\$$
See Demo (Click ON "RUN TESTS")
thanks for all the answers. Looked through them, and I figured out a pretty simple way to do it using the regular expression below. I edited it to allow for setting a length on the password, just change the 9 and 15 to your desired lengths.
/^(?=.*[A-Z])(?=.*\d.*\d)[^\s]{9,15}\$$/

Multiple {n} quantifiers regex

Is it possible to have multiple quantifiers in a regex?
Say I have the following regex:
[A-Z0-9]{44}|[A-Z0-9]{36}|[A-Z0-9]{30}
I want to match any string which is either 30, 36 or 44 chars long. Is it possible to write this shorter in any way? Something like the following:
[A-Z0-9<]{30|36|44}?
Edit: Seeing the answers I assume there is not really a way in which you can write the above shorter. The best solution would be to solve it programmatically I guess. Thanks for the input.
Brief
Note that your regex performs much better than any other answers you'll get on your question, but since your question is actually about simplifying/shortening your regex, you can use this.
Your original regex (38 characters):
[A-Z0-9]{44}|[A-Z0-9]{36}|[A-Z0-9]{30}
Your original regex with modifications so that we can use it to test against multiline input (44 characters):
^(?:[A-Z0-9]{44}|[A-Z0-9]{36}|[A-Z0-9]{30})$
Code
My original regex (32 characters):
([A-Z0-9]){44}|(?1){36}|(?1){30}
My original regex with modifications so that we can use it to test against multiline input (38 characters):
^(?:([A-Z0-9]){44}|(?1){36}|(?1){30})$
See regex in use here
Explanation
([A-Z0-9]){44}|(?1){36}|(?1){30} Match either of the following
([A-Z0-9]){44} Match any character in the set (A-Z or 0-9) exactly 44 times. This also captures a single character in the set into capture group 1. We will later use this capture group through recursion.
(?1){36} Recurse the first subpattern exactly 36 times
(?1){30} Recurse the first subpattern exactly 30 times
Looks like you want
[A-Z0-9]{30}([A-Z0-9]{6}([A-Z0-9]{8})?)?
This isn't actually simpler, mind you.
You don't need to check your input contains only uppercase letters [A-Z] and digits [0-9] to test whether it is a string. Eliminate [A-Z0-9] part for this reason. Now, you can specify multiple quantifiers as follows:
^(?:.{30}|.{36}|.{44})$
If you need to do that check strictly. You can use this regex without typing [A-Z0-9] three times:
^(?=[A-Z0-9]*$)(?:.{30}|.{36}|.{44})$
You have the [A-Z0-9] part only once and a generic . to check the length of string.

Regex how to get a full match of nth word (without using non-capturing groups)

I am trying to use Regex to return the nth word in a string. This would be simple enough using other answers to similar questions; however, I do not have access to any of the code. I can only access a regex input field and the server only returns the 'full match' and cannot be made to return any captured groups such as 'group 1'
EDIT:
From the developers explaining the version of regex used:
"...its javascript regex so should mostly be compatible with perl i
believe but not as advanced, its fairly low level so wasn't really
intended for use by end users when originally implemented - i added
the dropdown with the intention of having some presets going
forwards."
/EDIT
Sample String:
One Two Three Four Five
Attempted solution (which is meant to get just the 2nd word):
^(?:\w+ ){1}(\S+)$
The result is:
One Two
I have also tried other variations of the regex:
(?:\w+ ){1}(\S+)$
^(?:\w+ ){1}(\S+)
But these just return the entire string.
I have tried replicating the behaviour that I see using regex101 but the results seem to be different, particularly when changing around the ^ and $.
For example, I get the same output on regex101 if I use the altered regex:
^(?:\w+ ){1}(\S+)
In any case, none of the comparing has helped me actually achieve my stated aim.
I am hoping that I have just missed something basic!
===EDIT===
Thanks to all of you who have contributed thus far, however, I am still running into issues. I am afraid that I do not know the language or restrictions on the regex other than what I can ascertain through trial and error, therefore here is a list of attempts and results all of which are trying to return "Two" from a sample of:
One Two Three Four Five
\w+(?=( \w+){1}$)
returns all words
^(\w+ ){1}\K(\w+)
returns no words atall (so I assume that \K does not work)
(\w+? ){1}\K(\w+?)(?= )
returns no words at all
\w+(?=\s\w+\s\w+\s\w+$)
returns all words
^(?:\w+\s){1}\K\w+
returns all words
====
With all of the above not working, I thought I would test out some others to see the limitations of the system
Attempting to return the last word:
\w+$
returns all words
This leads me to believe that something strange is going on with the start ^ and end $ characters, perhaps the server puts these in automatically if they are omitted? Any more ideas greatly appreciated.
I don't known if your language supports positive lookbehind, so using your example,
One Two Three Four Five
here is a solution which should work in every language :
\w+ match the first word
\w+$ match the last word
\w+(?=\s\w+$) match the 4th word
\w+(?=\s\w+\s\w+$) match the 3rd word
\w+(?=\s\w+\s\w+\s\w+$) match the 2nd word
So if a string contains 10 words :
The first and the last word are easy to find. To find a word at a position, then you simply have to use this rule :
\w+(?= followed by \s\w+ (10 - position) times followed by $)
Example
In this string :
One Two Three Four Five Six Seven Height Nine Ten
I want to find the 6th word.
10 - 6 = 4
\w+(?= followed by \s\w+ 4 times followed by $)
Our final regex is
\w+(?=\s\w+\s\w+\s\w+\s\w+$)
Demo
It's possible to use reset match (\K) to reset the position of the match and obtain the third word of a string as follows:
(\w+? ){2}\K(\w+?)(?= )
I'm not sure what language you're working in, so you may or may not have access to this feature.
I'm not sure if your language does support \K, but still sharing this anyway in case it does support:
^(?:\w+\s){3}\K\w+
to get the 4th word.
^ represents starting anchor
(?:\w+\s){3} is a non-capturing group that matches three words (ending with spaces)
\K is a match reset, so it resets the match and the previously matched characters aren't included
\w+ helps consume the nth word
Regex101 Demo
And similarly,
^(?:\w+\s){1}\K\w+ for the 2nd word
^(?:\w+\s){2}\K\w+ for the 3rd word
^(?:\w+\s){3}\K\w+ for the 4th word
and so on...
So, on the down side, you can't use look behind because that has to be a fixed width pattern, but the "full match" is just the last thing that "full matches", so you just need something whose last match is your word.
With Positive look-ahead, you can get the nth word from the right
\w+(?=( \w+){n}$)
If your server has extended regex, \K can "clear matched items", but most regex engines don't support this.
^(\w+ ){n}\K(\w+)
Unfortunately, Regex doesn't have a standard "match only n'th occurrence", So counting from the right is the best you can do. (Also, Regex101 has a searchable quick reference in the bottom right corner for looking up special characters, just remember that most of those characters are not supported by all regex engines)

RegEx to find credit card numbers with embedded spaces

We currently have a content compliance in place where by we monitor anything that contains a credit card number with no spaces (e.g 5100080000000000)
What we need is for a reg ex to pick up credit card numbers that are entered with spaces every 4 digits (eg: 5100 0800 0000 0000)
We've been looking at alternate reg exs but have not yet found one that works for both scenarios mentioned above.
The current reg ex we use is below
^((4\d{3})|(5[1-5]\d{2})|(6011)|(34\d{1})|(37\d{1}))-?\d{4}-?\d{4}-?\d{4}|3[4,7][\d\s-]{15}$
Just add optional /s? in where you already have the optional -?
So your regex becomes
^((4\d{3})|(5[1-5]\d{2})|(6011)|(34\d{1})|(37\d{1}))-?\s?\d{4}-?\s?\d{4}-?\s?\d{4}|3[4,7][\d\s-]{15}$
It seems that you already accept a dash every four characters. Thus you can simply replace -? with [- ]? everywhere.
If you require the dashes or spaces to be consistent - that is, allow no grouping at all, or a dash every four characters, or a space every four characters, you can use a back reference to force the repetitions to be identical to the first match:
^(?:4\d{3}|5[1-5]\d{2}|6011|3[47]\d{2})([- ]?)\d{4}\1\d{4}\1\d{4}$
You will notice I removed the final 3[4,7]... which looked like an erroneous addition, apparently made when attempting to solve this problem partially. Also I changed the parentheses to non-grouping ones (?:...) or simply removed them where no grouping seemed necessary or useful, mainly because this makes it easier to see what the backreference \1 refers to. Finally, the 34.. and 37.. patterns had \d{1} where apparently \d{2} was intended (or if those particular series are only three digits before the first dash, the repetition {1} was just superfluous, but then the 3[4,7]... part would have been even more wrong!)
Won't all these ideas blow up on you as soon as someone uses and AMEX card and enters 3 or 5 numbers instead of 4 in any one 'block'
((\d+) *(\d+) *(\d+) *(\d+))
That would be the general idea (and it even works!), you can polish it if you want. There is a great page to test your regexp live - http://rubular.com/
Try this:
(\d{4} *\d{4} *\d{4} *\d{4})

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.