RegEx | Match string not starting with a specific word - regex

I am writing some mod_rewrite regex, and I have several request URLs that lookg like that
use1mycompany
use2mycompany
use3mycompany
use4mycompany
but also I have and some request URIs that start with
mycompany/user1mycompany/
mycompany/user2mycompany/
mycompany/user3mycompany/
mycompany/user4mycompany/
In order to fix that Issue I have used the following regex in apachec mod_rewrite
^(.*)mycompany/?$
the problem is that my regex matcing both the userXmycomany URL and the mycopmany/userXmycompany/
So, the question is, how can I match the urls that not starts with the string "mycompany" but end with the string "mycompany" ?
Kind regards

(?<!mycompany)/.*(?<=mycompany)/?
will match /user1mycompany/ but not mycompany//user1mycompany/

You should match string which have at least one character before mycompany.
* means 0 or more. + means 1 or more.
^(.+)mycompany/?$

Related

How can I write a opposite regex to this regex?

this is a regex of a proxy, if I add this to my proxy:
(.*\.|)(abc|google)\.(org|net)
my proxy will not transmit the abc.org, abc.net, google.org, google.net's traffic.
how can I write a regex opposite to this regex? I mean only transmit the abc.org, abc.net, google.org, google.net's traffic.
EDIT-01
My thought is just want to transmit abc.org or www.abc.org, how can I do with that?
Try this:
^(?!(www\.)?(?:abc|google)\.(?:net|org)).*
Demo: https://regex101.com/r/WOnFx8/3/
I used ?! to reverse the matching of your regex. This way, it will match any domain except these specific 4 domains.
Another way to do it is by using this code to include anything before the desired domains:
^(?!(.*\.|)(?:abc|google)\.(?:net|org)).*
demo: https://regex101.com/r/WOnFx8/4/
Your regex you write
(.*\.|)(abc|google)\.(org|net)
mean any string is one of abc.org, gooogle.org, abc.net, google.net, with optional prefix string ends with dot (.)
Like: test.google.org, sub.abc.net,...
I think you want to match string like test.yahoo.com, but not test.google.org. If you can use negative look ahead, this is the answer:
^(.*\.|)(?!(abc|google)\.(org|net))\w+\.\w+$
Explain:
^ and $ to be sure your match is entire url string
Negative look ahead is to check the url is not something like abc.org, abc.net, google.org, google.net
And \w+\.\w+ to check the remain string is kind of URL type (something likes yahoo.com, etc...)
Im going to assume you have lookaheads, if so then you can simply use -
(^.*?\.(?!(abc|google))\w+\.(?:org|net)$)
Demo - https://regex101.com/r/5eC41R/3
What this does is -
Looks for the start of the url (till the first .)
Checks that next part is not abc or google
looks for the next section (till the next .)
Looks for a closing org or net
Note that since it is a lookahead it will be slow compared to other regex matches

Django url regex hit end of url: *./something

I am geting 404 on different urls that end with the same string and instead of creating multiple redirects I would like to catch them all on the last string. It always appears at the same position, pattern goes like so:
/some-of-my-urls/the-same-string
No trailing slash there. I tried something like this:
url(r'^[a-zA-Z0-9_]+/the-same-string', redirect_func),
url(r'^./the-same-string', redirect_func),
But that doesn't work. Probably obvious for somebody with more regex knowledge, I am not very advanced. Anybody ideas?
You may use a negated character class [^/] to match any char but / and quantify it with a + quantifier that matches 1 or more repetitions:
r'^[^/]+/the-same-string'
See the regex demo.

Write a regex for url match

I'm trying to write wordpress pretty permalinks regex.
I have following urls. I need 2 matches,
1st : last word between / and / before get/
2nd : string which is start with get/
Url's may be like these
http://localhost/akasia/yacht-technical-services/yacht-crew/get/gulets/for/sale/
Here I need "yacht-crew" and "get/gulets/for/sale/"
http://localhost/akasia/testimonials/get/motoryachts/for/sale/
here I need "testimonials" and get/motoryachts/for/sale/
http://localhost/akasia/may/be/lots/of/seperator/but/ineed/last/get/ships/for/rent/
here I need "last" and get/ships/for/rent/
I catch 2nd part with
(.(get/(.)?))
but for first part there is no luck.
I will be appreciated if someone helps.
Regards
Deniz
I suggest the following:
([^\/]+?)\/(get\/.+)
https://regex101.com/r/uN6yH3/1
The concept is that you match non-slash characters up to the first slash (non-greedy) that is followed by the word "get" grouping it, and then just grab the rest as the second group.
I am assuming PHP.
$path = parse_url($url,PHP_URL_PATH);
$s = strrpos($path,'/');
$matches[] = substr($path,$s+1);

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+

MFC: How do I construct a good regular expression that validates URLs?

Here's the regular expression I use, and I parse it using CAtlRegExp of MFC :
(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))://)?([a-zA-Z0-9]+[\.]+[a-zA-Z0-9]+[\.]+[a-zA-Z0-9])
It works fine except with one flaw. When URL is preceded by characters, it still accepts it as a URL.
ex inputs:
this is a link www.google.com (where I can just tokenize the spaces and validate each word)
is...www.google.com (this string still matches the RegEx above :( )
Please help...
Thanks...
Use the IgnoreCase flag instead of catering for each case.
Stick a ^ at the beginning if you want the start of the string to be the start of the URL
You're missing a lot of characters from possible, valid URLs.
You need to tell the regex to only match at the start and end of the string. I'm not sure how you do that in VC++ - in most regexs you enclose the pattern with ^ and $. The ^ says "the start of the string" and the $ says "the end of the string."
^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?([a-zA-Z0-9]+[\\.]+[a-zA-Z0-9]+[\\.]+[a-zA-Z0-9])$
The second is matching because the string still contains a valid URL.
How about using CUrl (that is, 'C-Url', in ATL, not curl as in libcurl) which can 'parse' urls with CUrl::CrackUrl . If that function returns FALSE you assume it's not a valid URL.
That said, decomposing URL is sufficiently complex to warrant a proper parser, not a regex based decomposition. Cfr. rfc 2396 etc. for an overview on the complexities.
Start the regex with ^ to and end it with $ to have the regex match only if the entire sting matches (if that's what you want):
^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?([a-zA-Z0-9]+[\.]+[a-zA-Z0-9]+[\.]+[a-zA-Z0-9])$
What about this one: (((f|ht)tp://)[-a-zA-Z0-9#:%_\+.~#?&//=]+) ?
This Regular Expression has been tested to work for the following
http|https://host[:port]/[?][parameter=value]*
public static final String URL_PATTERN = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
PS. It also validates on localhost link.
(Thoroughly written by me :-))