I need some assistance constructing a regular expression in a ColdFusion application. I apologize if this has been asked. I have searched, but I may not be asking for the correct thing.
I am using the following to search an email subject line for an issue number:
reMatchNoCase("[0-9]{5}", mailCheck.subject)
The issue number contains only numeric values, and should be exactly 5 digits. This is working except in cases where I have a longer number that appears in the string, such as 34512345. It takes the first 5 digits of that string as a valid issue number as well.
What I want is to retrieve only 5 digit numbers, nothing shorter or longer. I am then placing these into a list to be looped over and processed. Do I perhaps need to include spaces before and after in the regex to get the desired result?
Thank you.
The general way to exclude content from occurring before/after a match is to use negative lookbehind before the match and a negative lookahead afterwards. To do this for numeric digits would be:
(?<!\d)\d{5}(?!\d)
(Where \d is the shorthand for [0-9])
CF's regex supports lookaheads, but unfortunately not lookbehinds, so that wouldn't work directly in rematch - however that probably doesn't matter in this case because it's likely that you don't want, for example, abc12345 to match either - so what you more likely want is:
\b\d{5}\b
Where \b is a "word boundary" - roughly, it checks for a change between a "word character" and a non-word character (or visa versa) - so in this case the first \b will check that there is NOT one of [a-zA-Z0-9_] before the first digit, and the second \b will check that there isn't one after the fifth digit. A \b does not append any characters to the match (i.e. it is a zero-width assertion).
Since you're not dealing with case, you don't need the nocase variable and can simply write:
rematch( '\b\d{5}\b' , mailCheck.subject )
The benefit of this over simply checking for spaces is that the result is five digits (no need to trim), but the downside is that it would match values such as [12345] or 3.14159^2 which are probably not what you want?
To check for spaces, or the start/end of the string, you can do:
rematch( '(?:^| )\d{5}(?= |$)' , mailCheck.subject )
Then use trim on each result to remove spaces.
If that's not what you're after, go ahead and provide more details.
Related
I'm having trouble writing a regex that matches a pattern like this "%n%m%p" or "%n:%m%p". Only allow specific letters and each letter must have percent sign in front of it. No numbers allowed.
This regex /%(n|m|p)$/ works but allows numbers in between. For example this "%n3%p%m" matches. How do I disallow any numbers.
The regex %(n|m|p) itself matches either %n or %m or %p. That the numbers are allowed between each of the parts is most likely because of your other code.
You can match the whole with this regex
/^(%(n|m|p):{0,1}){0,}$/
Just need to be clear about the exact requirements.
The allowed letters are [nmp]
Each letter has to be preceded by a %
There can be an optional : before %
+ One or more tokens from ^ start to $ end
These requirements won't allow any digit.
^(?::?%[nmp])+$
You can test it at regex101
I can't leave a comment but I can answer, so...
It would help to know what exactly you need from this. Do you need those letters in that order? Do you need exactly 3? Or are you looking for any number of any length with any valid characters in between?
That said, one option if you're matching the entire string is
/^(%[nmp][^\d]*)+$/
which should match any %[nmp] with any character between them that isn't a number. Note though that this will match a single %n for example. If you want to match a specific number i or more than a certain number j, change the + to {i} or {j,} respectively.
As long as it has one of the letters and a percent sign it should
match. Just no numbers
Use the following regex pattern:
%[nmp](?!\d)\b
https://regex101.com/r/CrSnFp/2
(?!\d) - negative lookahead assertion, matches one of the specified characters if it's not followed by a number
I'm try to build regex pattern which requires the string to contain multicase letters together, but there's no success.
Here's what I have, but it doesn't work:
(?=[A-Z]+)(?=[a-z]+)(?=[0-9]+)
In other words, the string should to match only if it contains uppercase and lowercase and digits in any order like that:
MyPass777 <-- match
Mypass777 <-- match
MyPass <-- no match
mypass777 <-- no match
So, how to let this work?
Your positive lookaheads must also use .* before your conditions to allow for any arbitrary number of character before letter or numbers:
\b(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9]+\b
RegEx Demo
Also note use of \b (word boundary) on either side of your regex to make sure to match complete words only.
If you want a yes/no test, then use alternation.
Require something that has a upper and eventually a lower OR something that has a lower and eventually a upper.
With spaces added for clarity
(?: [a-z].*[A-Z] | [A-Z].*[a-z] )
With a third requirement, numbers, it gets combinatorially more expensive.
You're better off testing in three phases. Does this have a uppercase? If not, fail. Does it have a lowercase? If not, fail. Does it have a number? If not, fail. Else, it's okay.
Use separate regexes instead of single regex to gain additional benefits.
With this approach, you do not limit user to enter uppercase+lowercase+digits, but if they use for example uppercase+lowercase+punctation, the password will be considered equally good.
Test 4 cases:
[A-Z]
[a-z]
[0-9]
[\!+\-*##$%\^&*[\]{}:";'<>?,./] ' or refer to Unicode character class P (punctuation) instead
Now count matching cases.
1-2 cases: weak password.
3 cases: good password.
4 cases: strong password.
This pattern does forward lookahead and requires that the next character be an uppercase letter, a lowercase letter, and a digit at the same time. It never matches.
You want something like
(?=\w*[A-Z])(?=\w*[a-z])(?=\w*[0-9])(\w+\b)
At least, that's my best understanding of your problem: You want a string of alphanumeric characters that contains at least one uppercase letter, at least one lowercase letter, and at least one digit.
i'm trying to write a regex which can match a word in a string with theese conditions:
the word must be 8 character length.
the word must has 1 alphabetic character at any position of the
word.
the word must has 7 digits at any position of the word.
\b(?=\w{8}\z)(?=[^a-zA-Z]*[a-zA-Z]{1})(?=(?:[\D]*[\d]){7}).*\b
this can find "123r1234" and "foo 123r1234" but it doesn't find "foo bar 123r1234 foo".
i tried to add word boundries but it didn't work.
what is wrong with my regex and how can i fix it?
thanks.
You can use the following regex:
\b(?=[^a-zA-Z]*[a-zA-Z])(?=(?:\D*\d){7})\w{8}\b
See demo
There several things to note here:
It is not necessary to enclose single shorthand classes (like \d) into character classes (pattern becomes too awkward and less readable). Thus, use \D instead of [\D].
The rule of number of look-aheads should equal the number of conditions - 1 (see Fine-Tuning: Removing One Condition at rexegg.com). Most often, length restriction look-aheads with just 1 character/character class are valid candidates for being ported into the base pattern. Here, (?=\w{8}) can easily replace .* at the end.
The (?=\w{8}\z) look-ahead contains an end-of-string \z anchor that forces a match at the end of the string, while you need (as now I know) the end of a word.
[a-zA-Z]{1} is equal to [a-zA-Z] since {1} means *exactly one repetition, and it is redundant (again, regex patterns should be as clean and concise as they can be).
UPDATE (+1 goes to #Jonny5)
There is another way of approaching the current problem: by having the word contain 8 word characters, but matching only 1 letter enclosed with any number of digits. This can be achieved with
(?i)\b(?=\w{8}\b)\d*[a-z]\d*\b
See another demo (Note i modifier is used here)
You can remove last asterisk and change it by the 8 counter.
\b(?=[^a-zA-Z]*[a-zA-Z])(?=(?:[\D]*[\d]){7})\w{8}\b
You can view it running here:
https://regex101.com/r/bX6rK8/1
I have a regex that I thought was working correctly until now. I need to match on an optional character. It may be there or it may not.
Here are two strings. The top string is matched while the lower is not. The absence of a single letter in the lower string is what is making it fail.
I'd like to get the single letter after the starting 5 digits if it's there and if not, continue getting the rest of the string. This letter can be A-Z.
If I remove ([A-Z]{1}) +.*? + from the regex, it will match everything I need except the letter but it's kind of important.
20000 K Q511195DREWBT E00078748521
30000 K601220PLOPOH Z00054878524
Here is the regex I'm using.
/^([0-9]{5})+.*? ([A-Z]{1}) +.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/
Use
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
You can make the single letter optional by adding a ? after it as:
([A-Z]{1}?)
The quantifier {1} is redundant so you can drop it.
You have to mark the single letter as optional too:
([A-Z]{1})? +.*? +
or make the whole part optional
(([A-Z]{1}) +.*? +)?
You also could use simpler regex designed for your case like (.*)\/(([^\?\n\r])*) where $2 match what you want.
here is the regex for password which will require a minimum of 8 characters including a number and lower and upper case letter and optional sepecial charactor
/((?=.\d)(?=.[a-z])(?=.*[A-Z])(?![~##$%^&*_-+=`|{}:;!.?"()[]]).{8,25})/
/((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?![~##\$%\^&\*_\-\+=`|{}:;!\.\?\"()\[\]]).{8,25})/
I've got the following url route and i'm wanting to make sure that a segment of the route will only accept numbers. as such, i can provide some regex which checks the word.
/page/{currentPage}
so.. can someone give me a regex which matches when the word is a number (any int) greater than 0 (ie. 1 <-> int.max).
/^[1-9][0-9]*$/
Problems with other answers:
/([1-9][0-9]*)/ // Will match -1 and foo1bar
#[1-9]+# // Will not match 10, same problems as the first
[1-9] // Will only match one digit, same problems as first
If you want it greater than 0, use this regex:
/([1-9][0-9]*)/
This'll work as long as the number doesn't have leading zeros (like '03').
However, I recommend just using a simple [0-9]+ regex, and validating the number in your actual site code.
This one would address your specific problem. This expression
/\/page\/(0*[1-9][0-9]*)/ or "Perl-compatible" /\/page\/(0*[1-9]\d*)/
should capture any non-zero number, even 0-filled. And because it doesn't even look for a sign, - after the slash will not fit the pattern.
The problem that I have with eyelidlessness' expression is that, likely you do not already have the number isolated so that ^ and $ would work. You're going to have to do some work to isolate it. But a general solution would not be to assume that the number is all that a string contains, as below.
/(^|[^0-9-])(0*[1-9][0-9]*)([^0-9]|$)/
And the two tail-end groups, you could replace with word boundary marks (\b), if the RE language had those. Failing that you would put them into non-capturing groups, if the language had them, or even lookarounds if it had those--but it would more likely have word boundaries before lookarounds.
Full Perl-compatible version:
/(?<![\d-])(0*[1-9]\d*)\b/
I chose a negative lookbehind instead of a word boundary, because '-' is not a word-character, and so -1 will have a "word boundary" between the '-' and the '1'. And a negative lookbehind will match the beginning of the string--there just can't be a digit character or '-' in front.
You could say that the zero-width assumption ^ is just one of the cases that satisfies the zero-width assumption (?<![\d-]).
string testString = #"/page/100";
string pageNumber = Regex.Match(testString, "/page/([1-9][0-9]*)").Groups[1].Value;
If not matched pageNumber will be ""
While Jeremy's regex isn't perfect (should be tested in context, against leading characters and such), his advice is good: go for a generic, simple regex (eg. if you must use it in Apache's mod_rewrite) but by any means, handle the final redirect in server's code (if you can) and do a real check of parameter's validity there.
Otherwise, I would improve Jeremy's expression with bounds: /\b([1-9][0-9]*)$/
Of course, a regex cannot provide a check against any max int, at best you can control the number of digits: /\b([1-9][0-9]{0,2})$/ for example.
This will match any string such that, if it contains /page/, it must be followed by a number, not consisting of only zeros.
^(?!.*?/page/([0-9]*[^0-9/]|0*/))
(?! ) is a negative look-ahead. It will match an empty string, only if it's contained pattern does not match from the current position.