Regex to match after double backslash slash until .com - regex

I've been trying to regex the following message:
Netlogon has failed an additional 130 authentication requests in the
last 30 minutes. The requests timed out before they could be sent to
domain controller \\AA-SRV85.xx.acme.com in domain XX. Please see
http://support.microsoft.com/kb/2654097 for more information.
So far, I've managed to understand that if use the following, I will find a match for \\AA-SVR85.xx.acme.com
\\\\AA\-SRV85\.xx\.acme\.com
But the thing is, I have multiple servers in my environment and the server name will certainly vary.
Can someone please explain how this should be done?
My goal is to match everything after the double backslash until the end of the domain (.com).

This will match "everything after the double backslash until the end of the domain (.com)." As requested in your question.
\\\\.*?\.com
You may want to modify it a bit to match upper or lower case COM:
\\\\.*?\.[Cc][Oo][Mm]
Here is how it works:
\\ matches \
. matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\. matches a period
[Cc] matches C or c
[Oo] matches O or o
[Mm] matches M or m
Replacing the . with [^ ] so that it won't match spaces between
the \\ and the .com is probably an improvement also...

^\\\\[a-zA-Z0-9-.]+.com$
As per your question, a
^ means regex match starts here,
\\ - matches a \, so two pairs of \\ matches the two \\ in your URL.
[a-zA-Z0-9-.] matches the characters from a-z, A-Z, 0-9, dash, and a period.
+ means match the above condition infinite times.
a .com matches literal .com
$ signifies regex ends here.

/\\\\\w+-\w+\.xx\.\w+.com/ig
here we have
\\ for matching \
\w+ for matching one or more character
- for matching -
\. for matching .
com for exact string match
g for globally search in string.
i for case in-sensitive.

Related

Regex Help required for User-Agent Matching

Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
Match a single number with the option of capturing more than one: scan-2 or scan-02 with an optional letter: scan-2j or scan-02f
Append the rest of the User Agent: .shadowserver.org to the regex.
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org
This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+) Group1: match the word scan followed by a literal -, you escaped the hyphen with a \, but if you keep it without escaping it also means a literal - in this case, so you don't have to escape it here, the - followed by \d+ which means one more digit from 0-9 there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+ non-capturing group, \w one character which is equal to [A-Za-z0-9_], but the the plus + sign after the non-capturing group (?:\w)+, means match the whole group one or more times, the group contains only \w which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+ directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan will match the word scan in scan-02 and the \- will match the hyphen after scan scan-, the \d+ which means match one or more digit at first it will match the 02 after scan- and the value would be scan-02, then the (?:\w)+ part, the plus + means match one or more word character, at least match one, it will try to match the period . but it will fail, because the period . is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+, and this time it will only match the 0 in scan-02, and the value scan-0 will be saved inside the first capturing group, then the (?:\w)+ part will match the 2 in scan-02, but why the engine returns back to \d+ ? this is because you used the + sign after \d+, (?:\w)+ which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.
The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+) will match scan-2, (?:\w)+ will try to match the period after scan-2 but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org and try to match (scan\-\d+) again but starting from the character c in the string , so s in (scan\-\d+) faild to match c, and it will continue trying, at the end it will fail.
Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?), Group1: will capture the word scan, followed by a literal -, followed by \d+ one or more digits, followed by an optional small letter [a-z]? the ? make the [a-z] part optional, if not used, then the [a-z] means that there must be only one small letter.
See regex demo

How can I get the first and last part of one wordcombination using regex

How can I get only the middle part of a combined name with PCRE regex?
name: 211103_TV_storyname_TYPE
result: storyname
I have used this single line: .(\d)+.(_TV_) to remove the first part: 211103_TV_
Another idea is to use (_TYPE)$ but the problem is that I donĀ“t have in all variations of names a space to declare a second word to use the ^ for the first word and $ for the second.
The variation of the combined name is fix for _TYPE and the TV.
The numbers are changing according to the date. And the storyname is variable.
Any ideas?
Thanks
With your shown samples, please try following regex, this creates one capturing group which contains matched values in it.
.*?_TV_([^_]*)(?=_TYPE)
OR(adding a small variation of above solution with fourth bird's nice suggestion), following is without lazy match .*? unlike above:
_TV_([^_]*)(?=_TYPE)
Here is the Online demo for above regex
Explanation: Adding detailed explanation for above.
.*?_ ##Using Lazy match to match till 1st occurrence of _ here.
TV_ ##Matching TV_ here.
([^_]*) ##Creating 1st capturing group which has everything before next occurrence of _ here.
(?=_TYPE) ##Making sure previous values are followed by _TYPE here.
You could match as least as possible chars after _TV_ until you match _TYPE
\d_TV_\K.*?(?=_TYPE)
\d_TV_ Match a digit and _TV_
\K Forget what is matched until now
.*? Match as least as possible characters
(?=_TYPE) Assert _TYPE to the right
Regex demo
Another option without a non greedy quantifier, and leaving out the digit at the start:
_TV_\K[^_]*+(?>_(?!TYPE)[^_]*)*(?=_TYPE)
_TV_ Match literally
\K[^_]*+ Forget what is matched until now and optionally match any char except _
(?>_(?!TYPE)[^_]*)* Only allow matching _ when not directly followed by TYPE
(?=_TYPE) Assert _TYPE to the right
Regex demo
Edit
If you want to replace the 2 parts, you can use an alternation and replace with an empty string.
If it should be at the start and the end of the string, you can prepend ^ and append $ to the pattern.
\b\d{6}_TV_|_TYPE\b
\b\d{6}_TV_ A word boundary, match 6 digits and _TV_
| Or
_TYPE\b Match _TYPE followed by a word boundary
Regex demo
Here i put some additional Screenshots to the post. With the Documentation that appears on the help button. And you see the forms and what i see.
Documentation
The regular expressions we use are based on PCRE - Perl Compatible Regular Expressions. Full specification can be found here: http://www.pcere.org and http://perldoc.perl.org/perlre.html
Summary of some useful terms:
Metacharacters
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
Quantifiers
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
Charcter Classes
\w Match a "word" character (alphanumeric plus mao}
\W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
Capture buffers
The bracketing construct (...) creates capture buffers. To refer to
Within the same pattern, use \1 for the first, \2 for the second, and so on. Outside the match use "$" instead of "". The \ notation works in certain circumstances outside the match. See the warning below about \1 vs $1 for details.
Referring back to another part of the match is called a backreference.
Examples
Replace story with certain prefix letters M N or E to have the prefix "AA":
`srcPattern "(M|N|E ) ([A-Za-z0-9\s]*)"`
`trgPattern "AA$2" `
`"N StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
`"E StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
`"M StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
"NoMatchWord StoryWord1 StoryWord2" -> "NoMatchWord StoryWord1 StoryWord2" (no match found, name remains the same)

problems with regex url validator

I'm trying to create a regex to test if a url is valid or not. I had a good example to work off of, but I had to tweak it a bit to make it fit my purpose:
^(https?:\/\/)(www\.)?(\w*\.)+([\w\-_~:/?#[\]#!$&'()*+,;=.])*$
It works fine for the most part, but it matches the following, which drives me nuts:
http://www..example..com
I tried forever and I just can't get the magical combination of characters to get it to ignore the above use case. What am I doing wrong?
Here's a list of things I want the regex to match (all of them are matched):
http://www.example.com
https://www.example.com
https://www.example.com/
https://example.com/
https://blog.example.com/
https://my.blog.example.com/
https://my.blog.example.co.uk/
https://www.example.com/#test
https://www.example.com#test
https://www.example.com/test.php
https://www.example.com/test.php?test=yes&testmore=yesevenmore
https://www.example.com/test.php#test
https://www.example.com/test.php?test=yes&testmore2=yesevenmore&whatnumber=42#test
https://www.example.com/test
https://www.example.com/test/
https://www.example.com/test/?test=yes&testmore2=yesevenmore&whatnumber=42
https://www.example.com/test/#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.my.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://my.blog.example.co.uk/?test=yes&testmore=yesevenmore&whatnumber=42#test
http://255.255.255.255
http://www.example.com:8008
http://www.example.com:8008/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
Here's a list of things I DON'T want it to match:
www.example.com
example.com
*http://www.blog..example..com
*http://www..example.com
*http://www...example.com
*http://www..example..com
http://www.example.com | not valid
http://www.example.com|
255.255.255.255
* still matched
How can I prevent regex from matching the multidots?
Your pattern matches the dot literally \. as well as in the character class which is repeated 1+ times as a group and (\w*\.)+ also matches consecutive dots.
You could shorten the character class as some parts do not have to be escaped and \w also matches _
Using the characters from your character class that you accept to be valid you could repeat in a group matching what you want to allow excluding the dot and match a single dot at the end:
^https?:\/\/(?:[-\w~:/?#[\]#!$&'()*+,;=]+\.)*[-\w~:/?#[\]#!$&'()*+,;=]+$
That will match
^ Start of string
https?:\/\/ Match http:// or https://
(?: Non capturing group
[-\w~:/?#[\]#!$&'()*+,;=]+\. Match 1+ times any of listed, then match a .
)* Close group and repeat 0+ times
[-\w~:/?#[\]#!$&'()*+,;=]+ Match any of the listed 1+ times (note that there is no .)
$ End of string
Regex demo
A more specific variant:
^https?:\/\/\w+(?:\.\w+)*(?:[/#:][-\w~:/?#[\]#!$&'()*+,;=.]*)?$
Regex demo

Regex for alphanumberic with / or -

The regex should match alphabets or numbers with / or - in between them but should not start or end with / or -
I tried this using RegExr but does not work
[a-zA-Z0-9]+[/|-]*[a-zA-Z0-9]+$
Your current regex has the following problems :
it matches multiple / and -, but only in one spot (e.g. will match 0123/-/-456 but not 0123/456/789
it also matches |, which you don't need to use in a [character class]
it matches up until the end of the string$, but doesn't match from ^the start of the string (e.g. it would match foo0123/456, although it wouldn't match 0123/456foo)
You can use the following regex that Avinash Raj proposed :
^[a-zA-Z0-9]+(?:[/-][a-zA-Z0-9]+)*$
The first point it fixed by putting both the character classe matching slashes and dashes and the one matching alnum characters inside a (?:non-capturing group) which we can quantify with * to specify it can occur any number of time. This group will match any number of slash or dash followed by alnum characters.
The other two points are straightforward, we remove the useless | and add a ^ at the start of the regex.

Regex that matches every nth occurences of character

I have found solutions for finding nth occurrence but could not find about finding every nth occurrences.
I have string such as "key1~value1~key2~value2~key3~value3~".
What is the regex that will match every second occurrence of the ~?
key1~value1~key2~value2~key3~value3~
I am trying to create a custom Pattern Analizer for Elasticsearch that is the regex should match the token seperators instead of tokens.
You may use
~(?=(?:[^~]*~[^~]*~)*[^~]*$)
The pattern matches:
~ - a tilde that is followed by...
(?=(?:[^~]*~[^~]*~)*[^~]*$) - 0+ non-tildes + ~ x 2 times, 0+ times, and then 0+ non-tildes up to the end of string. So, this check makes sure there is an even number of tildes up to the end of string after matching the first tilde.
You need to ensure that there are not an even number of ~ before:
(?<!^([^~]*~[^~]*~)*[^~]*)~
Try it online!
How it works:
(?<!^([^~]*~[^~]*~)*[^~]*)~ Our regex.
~ Matches a tilde (~).
(?<! ) Assert that before it is not:
^ the beginning
( )* followed by zero or more times:
[^~]*~[^~]*~ two tildes, no matter what comes within
[^~]* followed by non-tildes.
First group of non-overlapping occurrences of ~.*?(~). Try: http://regexr.com/3dc15.