URL pattern to exclude globally in Zap - regex

I am having trouble with regex syntax in OWASP ZAP. I want to exclude from all scans all URLs that contain "web/lib". I've tried to add
^*web/lib*$
under Global Exclude URL option, but it didn't work. Please help - thanks a lot.

It's regex, if you're specifying wildcard you generally want period asterisk. You also probably need to escape the slash.
Eg: https://regex101.com/r/XLPF85/1

Related

What regex in Google Analytics to use for this case?

I'm trying to figure out what landing page regex to use to only show URLs that have only two sub-folders, e.g. see image below: just show green URLs but not the read ones as they have 3+ subfolders. Any advice on how to do this in GA with regex?
Cheers
If you want to match a path having only two components, e.g.
/component1/component2/
Then you may use the following regex:
/[^/]+/[^/]+/
Demo
If your regex tool requires anchors, then add them:
^/[^/]+/[^/]+/$
Is this what you are looking for?
^\/[!#$&-;=?-[]_a-z~]+\/[!#$&-;=?-[]_a-z~]+\/$
The two sections contain all the valid html characters. We're also forcing the regex to start with slash, end with slash and have only one slash in between.

How can I use regular expression to match urls starting with https and ending with #?

Very much a newb with regex and having a hard time figuring this one out. I have an HTML document and I want to clear out a ton of URLs that are inside of it. All of the URLs begin with https:// and they all end with a pound sign #.
Any help would be extremely appreciative. Using sublime text for my editor in case that is needed.
A basic way to do it:
\bhttps://[^\s#]+#
free-spaced:
\b //word start
https://
[^\s#]+ //followed by anything but whitespace and '#'
#
If you truly want to clear everything in between the url from https:// [...] # then you can use:
^(https)+(.)*(#)+$
But you may want to be more specific in terms of what you are filtering out. If this is from a database query you should be ok since you can assume the URL will be the content of the field(s) returned the you will be running the regex through a code loop of some kind.
BTW you can hone your scripts using something like http://regexpal.com/

RegEx match all website links except those containing admin

I'm setting up URL Rewrite on an IIS and i need to match the following URLs using regex.
http://sub.mysite.com
sub.mysite.com
sub.mysite.com/
sub.mysite.com/Site1
sub.mysite.com/Site1/admin
but not:
sub.mysite.com/admin
sub.mysite.com/admin/somethingelse
sub.mysite.com/admin/admin
The site it self (sub.mysite.com) should not be "hardcoded" in the expression. Instead, it should be matched by something like .*.
I'm really blank on this one. I did find solutions to match the different URLs but once i try to combine them either none of them match or all of them do.
I hope someone can help me.
For your specific case, assuming you are matching the part after the domain (REQUEST_URI):
(?!/admin).*
(?!...) is a negative lookahead. I am not sure if it is supported in the IIS URL Rewrite engine. If not, a better approach would be to check for a complementary approach:
Or as #kirilloid said, just match /admin/? and discard (pay attention to slashes).
BTW. if you want to quickly test RegExps with a "visual" feedback, I highly recommend http://gskinner.com/RegExr/
([A-Za-z0-9]+.)+.com(?!/admin)/?([A-Za-z0-9]+/?)*
this should do the trick

Regex for simple urls

I am looking for regex for simple URLs as
http://www.google.com
http://www.yahoo.in
http://www.example.eu
http://www.example.net
etc.
No subdirectories allowed. For example in this cases it must not validate http://www.google.com/, http://www.yahoo.in/mail.
Does anyone know any regex to do this?
I'm still a noob, but try this:
^http:\/\/[a-zA-Z0-9_\-]+\.[a-zA-Z0-9_\-]+\.[a-zA-Z0-9_\-]+$
This one should do:
^(https?:\/\/)?[0-9a-zA-Z]+\.[-_0-9a-zA-Z]+\.[0-9a-zA-Z]+$
This should work for URLs starting with http:// or https:// or without the protocol name.
The regex should also be used as case-insensitive. In that case, it can be shortened a bit:
^(https?:\/\/)?[0-9a-z]+\.[-_0-9a-z]+\.[0-9a-z]+$
If you don't care whether it is a valid url, you can use:
\S*www\.\S+
All the examples contain www. followed by a nonspace character, but that is unlikely to occur in a normal word.

Regex for excluding URL

I working with an email company that has a feature where they spider your site in order to provide custom content. I have the ability to have the spider ignore urls based on the regex patterns I provide.
For this system a pattern starts and ends with a "/".
What I'm trying to do is ignore http://www.website.com/2011/10 BUT allow http://www.website.com/2011/10/title-of-page.html
I would have thought the pattern below would work since it does not have a trailing slash but no luck.
Any ideas?
/http:\/\/www\.website\.com\/[0-9][0-9][0-9][0-9]\/[0-9][0-9]/
Your regex matches a part of the URL, so you need to tell it not to allow a slash to follow it:
/http:\/\/www\.website\.com\/[0-9]{4}\/[0-9][0-9](?!\/)/
If you want to also avoid other partial matches like in http://www.website.com/2011/100, then an additional word boundary might help:
/http:\/\/www\.website\.com\/[0-9]{4}\/[0-9][0-9]\b(?!\/)/
It depends on the regexp engine but you can probably either use $ (if the URL is tokenised beforehand) or a match for whitespace and delimiters