Regex for matching, except two words. In Nginx server - regex

I am trying to create 3 servers in Nginx
one which can match everything EXCEPT if the name contains -testing or -staging.
one which matches everything with -testing in the server_name.
one which matches everything with -staging in the server_name.
The last two are not the problem. Those are working. I'm just struggling with the first.
Here is the regex i tried:
~^.*(?!(-testing)|(-staging))\.example\.lh$
https://regex101.com/r/m8vXsL/1 (removed the tilde)
But it still matches server_names containing those words. Any help would be appreciated.

If nginx supports it (you've chosen pcre in your example, so assume so), just insert a < to get negative look-behind instead. I.e
^.*(?<!(-testing)|(-staging))\.example\.lh$
^
Here at regex101.

Related

Using PCRE2 regex with repeating groups to find email addresses

I need to find all email addresses with an arbitrary number of alphanumeric words, separated through a period. To test the regex, I'm using the website https://regex101.com/.
The structure of a valid email addresses is word1.word2.wordN#word1.word2.wordN.word.
The regex /[a-zA-Z0-9.]+#[a-zA-Z0-9.]+.[a-zA-Z0-9]+/gm finds all email addresses included in the document string, but also includes invalid addresses like ........#....com, if present.
I tried to group the repeating parts by using round brackets and a Kleene star, but that causes the regex engine to collapse.
Invalid regex:
/([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+#([a-zA-Z0-9]+.?)*[a-zA-Z0-9]+.[a-zA-Z0-9]+/gm
Although there are many posts concerning regex groups, I was unable to find an explanation, why the regex engine fails. It seems that the engine gets stuck, while trying to find a match.
How can I avoid this problem, and what is the correct solution?
I think the main issue that caused you troubles is:
. (outside of []) matches any character,you probably meant to specify \. instead (only matches literal dot character).
Also there is no need to make it optional with ?, because the non-dot part of your regex will just match with the alphanumerical characters anyway.
I also reduced the right part (x*x is the same as x+), added a case-insensitive flag and ended up with this:
/([a-z0-9]+\.)*[a-z0-9]+#([a-z0-9]+\.)+[a-z0-9]+/gmi

Regular expression which matches a domain with only two ending characters?

I'm trying to stop spammers who are using short domains bit.ly etc. The domains they post seem to all be only two characters (not .com, etc).
I've used this:
\.[a-z][a-z]$
But, it has two problems:
it matches .co.uk
If anything is after the domain, it doesn't match (a space or slash, example: bit.ly/2231)
Could someone assist me with a regex that would accomplish this, please?
Whole URL matching. Depends on domain being before the first forward slash past protocol. First one uses if it only has one dot in the url and ends with two character primary TLD. Second one uses negative lookbehind to make sure it's not something like .co.uk.
https://regex101.com/r/5acu56/2
^(https?:\/\/)?[^\/.]+\.[a-z][a-z](\/|\s*$)
https://regex101.com/r/p8Ajw9/2
^(https?:\/\/)?[^\/]+(?<!\.[a-z][a-z])\.[a-z][a-z](\/\s*|\s*$)

Mistaken Squid Proxy regex? → ^.*stackoverflow\.*

I have several proxy rule files for Squid, and all contain rules like:
acl blacklisted dstdom_regex ^.*facebook\.* ^.*youtube\.* ^.*games.yahoo.com\.*
The patterns match against the domain name: dstdom_regex means destination (server) regular expression pattern matching.
The objective is to block some websites, but I don't know by what method: domain name, keywords in the domain name, ...
Let's expand/describe the pattern:
^.*stackexchange\.* The whole pattern
^ String beginning
.* Match anything (greedy quantifier, I presume)
stackexchange Keyword to match
\.* Any number of dots (.)
Totally legitimate matches:
stackexchange.com: The Stack Exchange website.
stackoverflow.stackexchange: The imaginary Stack Exchange gTLD.
But these possible matches make it seem more like a keyword block:
stackexchange
stackexchanger
notstackexchange
not-stackexchange
some-website.stackexchange
some-website.stackexchange-tld
And the pattern seems to contain a bug, since it allows the following invalid cases to match, thanks to the \.* at the end, although they never naturally occur:
stackexchange.
stackexchange...
stackexchange..........
stackexchange.......com
stackexchange.com
stackexchangecom
you get the idea.
Anything containing stackexchange, even if separated by dots from everything else, is still a valid match.
So now, the question itself:
This all means that this is simply a match for stackexchange! (I'm assuming the original author didn't intend to match infinite dots.)
So why not just use the pattern stackexchange? Wouldn't it be faster and give the same results, except for the "bug" (\.*)?
I.e., isn't ^.*stackexchange equivalent to stackexchange?
Edit: Just to clarify, I didn't write those proxy rule files.
I don't understand why you use \.* to match all the following dots
However to bypass your problem you can try this out :
^[^\.]*\.stackexchange\.*
[^\.]* matches anything except a dot
\. then you match the dot
edit : formatting

regex negative lookbehind - pcre

I'm trying to write a rule to match on a top level domain followed by five digits. My problem arises because my existing pcre is matching on what I have described but much later in the URL then when I want it to. I want it to match on the first occurence of a TLD, not anywhere else. The easy way to check for this is to match on the TLD when it has not bee preceeded at some point by the "/" character. I tried using negative-lookbehind but that doesn't work because that only looks back one single character.
e.g.: How it is currently working
domain.net/stuff/stuff=www.google.com/12345
matches .com/12345 even though I do not want this match because it is not the first TLD in the URL
e.g.: How I want it to work
domain.net/12345/stuff=www.google.com/12345
matches on .net/12345 and ignores the later match on .com/12345
My current expression
(\.[a-z]{2,4})/\d{5}
EDIT: rewrote it so perhaps the problem is clearer in case anyone in the future has this same issue.
You're pretty close :)
You just need to be sure that before matching what you're looking for (i.e: (\.[a-z]{2,4})/\d{5}), you haven't met any / since the beginning of the line.
I would suggest you to simply preppend ^[^\/]*\. before your current regex.
Thus, the resulting regex would be:
^[^\/]*\.([a-z]{2,4})/\d{5}
How does it work?
^ asserts that this is the beginning of the tested String
[^\/]* accepts any sequence of characters that doesn't contain /
\.([a-z]{2,4})/\d{5} is the pattern you want to match (a . followed by 2 to 4 lowercase characters, then a / and at least 5 digits).
Here is a permalink to a working example on regex101.
Cheers!
You can use this regex:
'|^(\w+://)?([\w-]+\.)+\w+/\d{5}|'
Online Demo: http://regex101.com/

Mod Rewrite RegEx To Match Only If Previous Subset Matched

I am trying to make what I think is a simple regex for use with mod_rewrite.
I've tried various expressions, many of which I thought were promising, but all of which ultimately failed for one reason or another. They all also seem to fail once I add start/end string delimiters.
For example, ^user/(\d{1,10})(?=/)$ was one I tried, but among other things, it seems to group the trailing slash, and I only want to group the digits. I think I need to use a positive lookbehind, but I'm having difficulty because it's looking behind at a group.
What I am trying to match is strings that 1) begin with "user/" and 2) possibly end with (\d{1,10})/ (1 to 10 digits followed by a single slash)
Should Match:
user/
user/123/
user/1234567890/
Should not match:
user
user//
user/-4/
user/35.5/
user/123
user/123//
user/123/5/
user/12345678901/
Edit: Sorry about the formatting; I do not understand how to format anything via this markdown. Those examples are preceded by 4 spaces which I thought should make a code block, but obviously I thought wrong.
^user/(?:([0-9]{1,10})/)?$ should work just fine.
This: ^user(?=/)(/\d{1,10})?/$ Edit: if you want to group digits, ^user(?=/)(?:/(\d{1,10}))?/$