Fail2Ban regex exclude if / found after certain point - regex

In a nutshell, I want to find all requests for php files in the root of my web app, but not in subdirectories.
E.g. I would like the following to match:
/home/myapp/public_html/anyfile.php
but not the following:
/home/myapp/public_html/subdir/anyfile.php
My regex looks like this:
\/home\/myapp\/public_html\/\S*(?!\/)\S*\.php
It's matching both examples above - I can't seem to get it to fail if there's another / after /public_html/
Any help appreciated!

OK, it seems I need to use [^\/]+ to match anything but / at that point, so the regex will be:
\/home\/myapp\/public_html\/[^\/]+\.php

Related

Regex for folder path: only 1 subfolder

I would like to ask for help from people that have more advanced regex understanding than me. I have spent many hours trying stuff and also gone thru the tutorials on youtube and I'm at my wits end because it kind of works but the regex pattern is kind of broad I'm unable to narrow it down
Basically I'm using this regex
apidocs\/static\/[^\/]+[.](?:png|jpg|jpeg|gif|pdf)
so it will consider this valid
./apidocs/static/mypicture.jpg
also this will be valid
apidocs/static/mypicture.jpg
regex demo:
https://regex101.com/r/p1EA9m/1
But I find that these are also valid which is not my intention
./traapidocs/static/mypicture.jpg
./whatever/apidocs/static/mypicture.jpg
How can i configure the regex so that only these 2 patterns are valid (root folders are ./apidocs or apidocs)
./apidocs/static/mypicture.jpg
apidocs/static/mypicture.jpg
I'm using this in a python script btw, and found that putting a caret infront in a group does not work. Maybe there is a simpler way to form the regex.
Thank you in advance to anyone that is able to help!
You can use
^(?:\.\/)?apidocs\/static\/[^\/]+\.(?:png|jpe?g|gif|pdf)$
See the regex demo.
Details:
^ - start of string
(?:\.\/)? - an optional ./ string
apidocs\/static\/ - a apidocs/static/ string
[^\/]+ - one or more chars other than /
\. - a dot
(?:png|jpe?g|gif|pdf) - png, jpg or jpeg, gif, pdf
$ - end of string
This ones seems to work for your inputs:
^(\.\/)?apidocs\/static\/[^\/]+[.](?:png|jpg|jpeg|gif|pdf)$
Demo: https://regex101.com/r/VA0bKz/1

Regex on domain and negation against language subfolders

Let's say my domains are:
www.test.com
www.test.com/en-gb
www.test.com/cn-cn
These are language sites, the first is the main US English site. In Google Analytics I want to set up a filter to only show me traffic of the first (US) domain. I could do this, I think:
^\/(en-gb|cn-cn).*$
If I EXCLUDE my Request URI with that filter pattern, then I should get a view for the en-US domain. However, I'm interested in understanding regex better so here is some test data and code which I am trying out on http://www.regextester.com/
Regular expression:
^\/(en-gb|cn-cn).*$
Test String
/cn-cn/about
/cn-cn/about/
/cn-cn
/cn-cn/about/test
/en-gb/
/en-gb
/en-gb-test/
/en-gb/aboutus/
/en-gb?q=1
/en-gb/?q=1
/about-us
/test?q=1
/aword/me/
/three
/about/en-gb/
/about/en-gb-test/
/test-yes/
/test/me/
/hello/world/
My questions:
If you try this out, you'll notice that /en-gb-test/ is actually matched with the Regex. How do I avoid this?
Also, let's say I wanted to have a rule to NEGATE this whole option. So rather than telling Google Analytics to "exclude", I am curious how I could write the opposite of this same rule. So basically, catch all URLs that are not in /en-gb and /cn-cn sub-folders.
Thanks in advance!
You may stop the regex from matching en-gb-test by making sure you may / or ? after it or the end of the string
^\/(en-gb|cn-cn)([\/?]|$)
See the regex demo. If you really need to get the rest of the string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).
Details:
^ - start of string
\/ - a / (note that you do not need to escape / in GA regex)
(en-gb|cn-cn) - a capturing group with 2 alternatives, either en-gb or cn-cn
([\/?]|$) - a capturing group with two alternatives: a ? or / OR the end of the string.
In RE2 regex, you cannot use lookaheads that are crucial when you need to match something other than something else. It would look like ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, but it is not possible with RE2.

KimonoLabs crawler Generated URL List with regex

So, I'm trying to crawl a website that has like 7,000 product pages and the link structure is like this:
https://example.com/category/sub-category/numericid-name-of-the-product/
What I'm trying to achieve is to Generate a URL list, the Kimono App has that option, and it actually sections the URL but I'm only offered default value, range, and custom list.
I tried to put in stuff like "/.+/" to match all the chars, but that does not work, I couldn't find any help on that on official kb.
.I know that import.io had that "{alpahnumeric}" for example for different parts of URL so it matches them, is there a way to accomplish that in kimonolabs app?
Try this regex: https://example.com/([^/]+)/([^/]+)/([0-9]+)-([^/]+)
Note: you may need to escape some characters (namely / would be escaped as \/).
Also, I'm not familiar with KimonoLabs, so I don't know if this is what you're looking for exactly. Feel free to clarify.
Explanation
https://example.com/ literally
([^/]+)/ a bunch of not /s, followed by a /
([0-9]+)-([^/]+) Numbers followed by another bunch of not /s

editpad regex. Searching files for "http://" but excluding "http://particular.domain.com"

I'm using RegexBuddy and getting nowhere defining a search parameter for editpad.
I'm trying to search through my CMS web site for all instances of "http://" (to see where the protocol was hardcoded incorrectly), but every file has "http://particular.domain.com" in the comments near the top of the file.
How can I search for all EXCEPT those? This seems like it should be basic.
Here's your expression:
http:\/\/(?!particular\.domain\.com).+
Check out a demo here: https://regex101.com/r/eT2cX8/2
This portion is called a negative lookahead that lets you negate that match:
(?!particular\.domain\.com).+
use a negative lookahead:
'(?!http://particular.domain.com)http://'
is an example of a pattern that would match any http:// text EXCEPT the particular one

Regex that matches directory path excluding filepath

I'm looking for a regex pattern that I need for IIS. Basically I want it to match any directory path but reject file paths. I've searched all over with little luck.
Example: Match: /directory/content/css/
Match: /directory/content/css
Reject: /directory/content/css/main.css
Reject: /directory/content/css/main.anything
!--Due to feedback I've made some changes (Apologies this is my first time on the forum)--!
So far I've put together this pattern: ^(\/.*)(.*)*[^.*]$
It appears to start out ok accepting anything starting with / but it still accepts extensions.
Thanks
What about this
\/.*?[\w:]+
https://regex101.com/r/b4Y6Si/1
although it can leave / at the end
Regex:
(?:\..*(?!\/))+
This regex will match if you got a file path..
Regex101