RegEx expression to handle multiple conditions of breaking sentences

RegEx expression to handle multiple conditions of breaking sentences - regex

I am trying to make a regex that is used in an exception.
Therefore it must return false for these sentences (the leading digits are included in the strings):
3.{17} this is italics and should break.{18} 
4. this is another sentence and should break. 
5. This is another sentence and should break. 
And it must return true for these:
There are 2 reasons for this 1. you are here and 2. you are communicating. 
Is it 2? they wanted to know. 
1 digit at the beginning but with 1. with a period should return true.
In other words, if the beginning of the string is a number followed by a period, it should return false (even if "\{\d+\}" follows it optionally) and the character following the space does not matter. And it must return true if the number and period (or ! or ?) is embedded in the sentence followed by a lower case character, in other cases it must be false.
As a further note: this goes into a java properties file, and the value is then passed to a perl5 regex engine to return broken text.
I try to express it in one expression, but somehow I cannot get it right.
This is what have come up with until now:
^([^0-9\.]+[\.]|
[^\.!\?]*[\?!]+[\?!\.]+|
[0-9]+[^\?!\.]+[\?!\.]+|
[^0-9]*[0-9]+[^\?!\.]+[\?!\.]+)
(\{\d+\}[\u0020\u00A0]|
[\u0020\u00A0]*)[a-z]
I seem to arrived at an impasse and can't see what is I have wrong.
Thanks for any advice.
Update:
A simpler format with look-ahead: ^(?!\d+\.)[^.!?]*[.!?]+(\{\d+\}\s|\s*)\p{Ll} based on the comments.

You may use
^(?!\d+\.)[^.!?]*[.!?]+(\{\d+\}\s|\s*)\p{Ll}
See the regex demo.
The pattern matches:
^ - start of string anchor
(?!\d+\.) - a negative lookahead that will fail the match if its pattern is matched at the start of the string: 1+ digits followed with a dot
[^.!?]* - 0+ chars other than ., ! and ?
[.!?]+ - 1 or more ., ! or ? symbols
(\{\d+\}\s|\s*) - either a { + 1 or more digits + } or 0+ whitespaces (if you are not interested in the value captured with this capturing group, you may turn it into a non-capturing one by adding ?: after the first ().
\p{Ll} - a lowercase letter (if a u modifier is used, it will also match all Unicode lowercase letters).

Related

Regex Help required for User-Agent Matching

Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
Match a single number with the option of capturing more than one: scan-2 or scan-02 with an optional letter: scan-2j or scan-02f
Append the rest of the User Agent: .shadowserver.org to the regex.
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org

This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+) Group1: match the word scan followed by a literal -, you escaped the hyphen with a \, but if you keep it without escaping it also means a literal - in this case, so you don't have to escape it here, the - followed by \d+ which means one more digit from 0-9 there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+ non-capturing group, \w one character which is equal to [A-Za-z0-9_], but the the plus + sign after the non-capturing group (?:\w)+, means match the whole group one or more times, the group contains only \w which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+ directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan will match the word scan in scan-02 and the \- will match the hyphen after scan scan-, the \d+ which means match one or more digit at first it will match the 02 after scan- and the value would be scan-02, then the (?:\w)+ part, the plus + means match one or more word character, at least match one, it will try to match the period . but it will fail, because the period . is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+, and this time it will only match the 0 in scan-02, and the value scan-0 will be saved inside the first capturing group, then the (?:\w)+ part will match the 2 in scan-02, but why the engine returns back to \d+ ? this is because you used the + sign after \d+, (?:\w)+ which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.
The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+) will match scan-2, (?:\w)+ will try to match the period after scan-2 but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org and try to match (scan\-\d+) again but starting from the character c in the string , so s in (scan\-\d+) faild to match c, and it will continue trying, at the end it will fail.
Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?), Group1: will capture the word scan, followed by a literal -, followed by \d+ one or more digits, followed by an optional small letter [a-z]? the ? make the [a-z] part optional, if not used, then the [a-z] means that there must be only one small letter.
See regex demo

How to get everything before the dash character in regex?

I have strings of the following form:
en-US //return en
en-UK //return en
en //don't return
nl-NL //return nl
nl-BE //return nl
nl //don't return
I would like to return the one's that are indicated in the code above. I tried .*\- but this returns en-. How do I stop returning before the slash? So only return en? I'm testing it here.

One option is to use a capturing group at the start of the string for the first 2 lowercase chars and then match the following dash and the 2 uppercase chars.
^([a-z]{2})-[A-Z]{2}$
Regex demo
If you want to capture multiple chars [a-z] (or any character except a hypen or newline [^-\r\n]) before the dash and then match it you could use a quantifier like + to match 1+ times or use {2,} to match 2 or more times.
^([a-z]{2,})-
Regex demo

You could use a positive lookahead.
.*(?=-)
If you are always specifically looking for 2 lowercase alpha characters preceeding a dash, then it is probably a good idea to be a bit more targeted with your regex.
[a-z]{2}(?=-)

.+?(?=-) as a regular expression should do what you are asking.
Where
. matches any character
+? matches between one and infinity times, but it does it as few times as possible, using lazy expansion
and
(?=-) Is a positive look ahead, so it checks ahead in the string, and only matches and returns if the next character in the string is - but the return will not include the value -

Matching any password except one containing repeating characters [duplicate]

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!

http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.

Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

Match 3 and 4 delimiters and between them; not less not more

I have a command-line program that its first argument ( = argv[ 1 ] ) is a regex pattern.
./program 's/one-or-more/anything/gi/digit-digit'
So I need a regex to check if the entered input from user is correct or not. This regex can be solve easily but since I use c++ library and std::regex_match and this function by default puts begin and end assertion (^ and $) at the given string, so the nan-greedy quantifier is ignored.
Let me clarify the subject. If I want to match /anything/ then I can use /.*?/ but std::regex_match considers this pattern as ^/.*?/$ and therefore if the user enters: /anything/anything/anyhting/ the std::regex_match still returns true whereas the input-pattern is not correct. The std::regex_match only returns true or false and the expected pattern form the user can only be a text according to the pattern. Since the pattern is various, here, I can not provide you all possibilities, but I give you some example.
Should be match
/.//
s/.//
/.//g
/.//i
/././gi
/one-or-more/anything/
/one-or-more/anything/g/3
/one-or-more/anything/i
/one-or-more/anything/gi/99
s/one-or-more/anything/g/4
s/one-or-more/anything/i
s/one-or-more/anything/gi/54
and anything look like this pattern
Rules:
delimiters are /|##
s letter at the beginning and g, i and 2 digits at the end are optional
std::regex_match function returns true if the entire target character sequence can be match, otherwise return false
between first and second delimiter can be one-or-more +
between second and third delimiter can be zero-or-more *
between third and fourth can be g or i
At least 3 delimiter should be match /.// not less so /./ should not be match
ECMAScript 262 is allowed for the pattern
NOTE
May you would need to see may question about std::regex_match:
std::regex_match and lazy quantifier with strange
behavior
I no need any C++ code, I just need a pattern.
Do not try d?([/|##]).+?\1.*?\1[gi]?[gi]?\1?d?\d?\d?. It fails.
My attempt so far: ^(?!s?([/|##]).+?\1.*?\1.*?\1)s?([/|##]).+?\2.*?\2[gi]?[gi]?\d?\d?$
If you are willing to try, you should put ^ and $ around your pattern
If you need more details please comment me, and I will update the question.
Thanks.

You could use this regular expression:
^s?([/|##])((?!\1).)+\1((?!\1).)*\1((gi?|ig)(\1\d\d?)?|i)?$
See regex101.com
Note how this also rejects these cases:
///anything/
/./anything/gg
/./anything/ii
/./anything/i/12
How it works:
Some explanation of the parts that are different:
((?!\1).): this will match any character that is not the delimiter. This way you are sure you can keep track of the exact number of delimiters used. You can this way also prevent that the first character after the first delimiter, is again that delimiter, which should not be allowed.
(gi?|ig): matches any of the valid modifier combinations, except a sole i, which is treated separately. So this also excludes gg and ii as valid character sequences.
(\1\d\d?)?: optionally allows for an extra delimiter (after a g modifier -- see previous) to be added with one or two digits following it.
( |i)?: for the case there is no g modifier present, but just the i or none: then no digits are allowed to follow.

This is a tricky one, but I took the challenge - here is what I have ended up with:
^s?([\/|##])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:i|(?:gi?|ig)(\1\d{1,2})?)?$
Pattern breakdown:
^ matches start of string
s? matches an optional 's' character
([\/|##]) matches the delimeter characters and captures as group 1
(?:(?!\1).)+ matches anything other than the delimiter character one or more times (uses negative lookahead to make sure that the character isn't the delimiter matched in group 1)
\1 matches the delimiter character captured in group 1
(?:(?!\1).)* matches anything other than the delimiter character zero or more times
\1 matches the delimiter character captured in group 1
(?: starts a new group
i matches the i character
| or
(?:gi?|ig) matches either g, gi, or ig
(\1\d{1,2})? followed by an optional extra delimiter and 0-9 once or twice
)? closes group and makes it optional
$ matches end of string
I have used non capturing groups throughout - these are groups that start ?:

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!

http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.

Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegEx expression to handle multiple conditions of breaking sentences - regex

Related

Regex Help required for User-Agent Matching

How to get everything before the dash character in regex?

Matching any password except one containing repeating characters [duplicate]

Match 3 and 4 delimiters and between them; not less not more

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

Categories

Resources