Regex for folder path: only 1 subfolder - regex

I would like to ask for help from people that have more advanced regex understanding than me. I have spent many hours trying stuff and also gone thru the tutorials on youtube and I'm at my wits end because it kind of works but the regex pattern is kind of broad I'm unable to narrow it down
Basically I'm using this regex
apidocs\/static\/[^\/]+[.](?:png|jpg|jpeg|gif|pdf)
so it will consider this valid
./apidocs/static/mypicture.jpg
also this will be valid
apidocs/static/mypicture.jpg
regex demo:
https://regex101.com/r/p1EA9m/1
But I find that these are also valid which is not my intention
./traapidocs/static/mypicture.jpg
./whatever/apidocs/static/mypicture.jpg
How can i configure the regex so that only these 2 patterns are valid (root folders are ./apidocs or apidocs)
./apidocs/static/mypicture.jpg
apidocs/static/mypicture.jpg
I'm using this in a python script btw, and found that putting a caret infront in a group does not work. Maybe there is a simpler way to form the regex.
Thank you in advance to anyone that is able to help!

You can use
^(?:\.\/)?apidocs\/static\/[^\/]+\.(?:png|jpe?g|gif|pdf)$
See the regex demo.
Details:
^ - start of string
(?:\.\/)? - an optional ./ string
apidocs\/static\/ - a apidocs/static/ string
[^\/]+ - one or more chars other than /
\. - a dot
(?:png|jpe?g|gif|pdf) - png, jpg or jpeg, gif, pdf
$ - end of string

This ones seems to work for your inputs:
^(\.\/)?apidocs\/static\/[^\/]+[.](?:png|jpg|jpeg|gif|pdf)$
Demo: https://regex101.com/r/VA0bKz/1

Related

Fail2Ban regex exclude if / found after certain point

In a nutshell, I want to find all requests for php files in the root of my web app, but not in subdirectories.
E.g. I would like the following to match:
/home/myapp/public_html/anyfile.php
but not the following:
/home/myapp/public_html/subdir/anyfile.php
My regex looks like this:
\/home\/myapp\/public_html\/\S*(?!\/)\S*\.php
It's matching both examples above - I can't seem to get it to fail if there's another / after /public_html/
Any help appreciated!
OK, it seems I need to use [^\/]+ to match anything but / at that point, so the regex will be:
\/home\/myapp\/public_html\/[^\/]+\.php

Regex on domain and negation against language subfolders

Let's say my domains are:
www.test.com
www.test.com/en-gb
www.test.com/cn-cn
These are language sites, the first is the main US English site. In Google Analytics I want to set up a filter to only show me traffic of the first (US) domain. I could do this, I think:
^\/(en-gb|cn-cn).*$
If I EXCLUDE my Request URI with that filter pattern, then I should get a view for the en-US domain. However, I'm interested in understanding regex better so here is some test data and code which I am trying out on http://www.regextester.com/
Regular expression:
^\/(en-gb|cn-cn).*$
Test String
/cn-cn/about
/cn-cn/about/
/cn-cn
/cn-cn/about/test
/en-gb/
/en-gb
/en-gb-test/
/en-gb/aboutus/
/en-gb?q=1
/en-gb/?q=1
/about-us
/test?q=1
/aword/me/
/three
/about/en-gb/
/about/en-gb-test/
/test-yes/
/test/me/
/hello/world/
My questions:
If you try this out, you'll notice that /en-gb-test/ is actually matched with the Regex. How do I avoid this?
Also, let's say I wanted to have a rule to NEGATE this whole option. So rather than telling Google Analytics to "exclude", I am curious how I could write the opposite of this same rule. So basically, catch all URLs that are not in /en-gb and /cn-cn sub-folders.
Thanks in advance!
You may stop the regex from matching en-gb-test by making sure you may / or ? after it or the end of the string
^\/(en-gb|cn-cn)([\/?]|$)
See the regex demo. If you really need to get the rest of the string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).
Details:
^ - start of string
\/ - a / (note that you do not need to escape / in GA regex)
(en-gb|cn-cn) - a capturing group with 2 alternatives, either en-gb or cn-cn
([\/?]|$) - a capturing group with two alternatives: a ? or / OR the end of the string.
In RE2 regex, you cannot use lookaheads that are crucial when you need to match something other than something else. It would look like ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, but it is not possible with RE2.

regex help - ignore /news* but target /new*

I am a beginner at regex and have the following problem:
I want a regex where it will target only the "/new" string and not the "/news" string. Here are two examples of text I want to use it on:
/category/news/new
/category/news/new?t=week
/politics/new
Also, the /news will always precede /new in order. There will also never be another '/' after /new. (I hope that makes sense to you.)
(I use this to apply active classes to the navigation menu. But I get a problem in the news section where the page thinks the active sort type is 'new' when it is actually 'top' as I base this off the URL's path)
I attempted writing one but it didn't work:
/new[^s]
Any help would be greatly appreciated.
Thanks
You may use
/new(?:[?/]|$)
See the regex demo
Details:
/new - a literal /new substring
(?:[?/]|$) - either of the two alternatives:
[?/] - either a ? or /
| - or
$ - end of string

Regex Finding URL's Within a string and replacing any word which ends at ".com"

I am trying to figure out how to look through a string and replace any word which ends at ".com" at the end with a valid url link.
for example:
"google.com has launces a .." will be replaced with
"<.a href='google.com'>google.com <./a> has launches .."
// I tried following code, but it only works for finding word which starts with "www."
data.rows[j].content = data.rows[j].content.replace(
/(^|<|\s)(www\..+?\..+?)(\s|>|$)/g,'$2'
)
First, I recommend learning how regex works. It's a very powerful tool that all developers should understand because it appears in many different programming languages.
Once you understand the basics, this should make more sense:
/(^|<|\b)(\S+?\.com)(\b|>|$)/g
Regex101 Demo - (for a breakdown of the regex, look in the top-right pane)
https://regex101.com/r/oF0mA9/2 should do the trick from the regex front.
You can follow this ways for ending com
(http(s?):)|([/|.|\w|\s])*\.(?:com)
OR
(?i)\.(com)$
OR
(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*\.(?:com))(?:\?([^#]*))?(?:#(.*))?
Resource Link:
Regex to check if valid URL that ends in .jpg, .png, or .gif

transforming URLS to active links with REGEX

i have this code in php that transforms URL inside a text to active html links.
For example in a string
Hey check this cool link http://www.example.com
this transforms to:
Hey check this cool link http://www.example.com
As you can see it just adds the correct < a > html tag
The code is this:
$active_links_text = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]","\\0", $original_text);
My question is, how to do this to work EXCEPT if the URL is a youtube url.
So i want this result: In a string
Wow have you checked http://www.youtube.com/watch?v=dQw4w9WgXcQ its even better than http://www.example.com !!!
i want to be transformed to
Wow have you checked http://www.youtube.com/watch?v=dQw4w9WgXcQ its even better than http://www.example.com
As you can see the < a > html tag was added to the example.com's URL but NOT at the youtube's URL.
How can i make this happen???
I hope i described my problem good enough, i hope its easy to implement this! Last note: i am using this code in php 5.2.14
Thank you guys!
[EDIT : Wow, I had gotten your question completely wrong! Below's a better attempt at helping you.]
I gave it a go in js here, here is the original regex : /(http:\/\/(?!www.youtube)[^<>\s]+)\b/g, since i'm not a php coder. The negative lookahead prevents a litteral www.youtube match (the lookahead content can be adapted if you need a more complex pattern).
There's nothing js-specific here to my knowledge, but I don't know the ereg regex syntax. with preg functions, you would just need not to escape the slashes, the word boundaries \b and negative lookahead (?!*pattern*) are the same. The /g flag is for a global replacement, that is, not stopping on the first match, I suppose you have a kind of replaceAll function in your toolbox.
Also, I'm not sure about the global flag in php, I guess you can just call a kind of replaceAll function.
You've made several mistakes about valid URI components. The scheme is defined as ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ), not [[:alpha:]]+.
The part after the : of the scheme need not start with //, that's particular to http: and a few other file-oriented schemes. But the [[:alpha:]]+: start of your regex shows you weren't aiming to restrict yourself to http:. In that case, all printable ASCII characters are valid. I.e. everything from ! to ~, or [\x21-x7E]* as a regex.
To summarize: [[:alpha:]][A-Za-z0-9+-.]*:[\x21-x7E]*.